All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
204 lines
4.1 KiB
Markdown
204 lines
4.1 KiB
Markdown
---
|
|
title: "Media Tools Troubleshooting"
|
|
description: "Solutions for Playwright browser automation failures, yt-dlp download errors, scraping issues, and resume/state file problems in media download tools."
|
|
type: troubleshooting
|
|
domain: media-tools
|
|
tags: [yt-dlp, playwright, scraping, debugging, video-download]
|
|
---
|
|
|
|
# Media Tools Troubleshooting
|
|
|
|
## Common Issues
|
|
|
|
### Playwright Issues
|
|
|
|
#### "playwright not installed" Error
|
|
```
|
|
ERROR: playwright not installed. Run: pip install playwright && playwright install chromium
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
pip install playwright
|
|
playwright install chromium
|
|
```
|
|
|
|
#### Browser Launch Fails
|
|
```
|
|
Error: Executable doesn't exist at /home/user/.cache/ms-playwright/chromium-xxx/chrome-linux/chrome
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
playwright install chromium
|
|
```
|
|
|
|
#### Timeout Errors
|
|
```
|
|
TimeoutError: Timeout 30000ms exceeded
|
|
```
|
|
|
|
**Causes:**
|
|
- Slow network connection
|
|
- Site is blocking automated access
|
|
- Page structure has changed
|
|
|
|
**Solutions:**
|
|
1. Increase timeout in script
|
|
2. Try without `--headless` flag
|
|
3. Check if site is up manually
|
|
|
|
---
|
|
|
|
### yt-dlp Issues
|
|
|
|
#### "yt-dlp not found" Error
|
|
```
|
|
yt-dlp not found. Install with: pip install yt-dlp
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
pip install yt-dlp
|
|
```
|
|
|
|
#### Download Fails for Specific Host
|
|
```
|
|
ERROR: Unsupported URL: https://somehost.com/...
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Update yt-dlp to latest version
|
|
pip install -U yt-dlp
|
|
```
|
|
|
|
If still failing, the host may be unsupported. Check [yt-dlp supported sites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md).
|
|
|
|
#### Slow Downloads
|
|
**Causes:**
|
|
- Video host throttling
|
|
- Network issues
|
|
|
|
**Solutions:**
|
|
- Downloads are typically limited by the source server
|
|
- Try at different times of day
|
|
|
|
---
|
|
|
|
### Scraping Issues
|
|
|
|
#### No Episodes Found
|
|
```
|
|
No episodes found!
|
|
```
|
|
|
|
**Causes:**
|
|
- Site structure has changed
|
|
- Page requires authentication
|
|
- Cloudflare protection triggered
|
|
|
|
**Solutions:**
|
|
1. Run without `--headless` to see what's happening
|
|
2. Check if the URL is correct and accessible manually
|
|
3. Site may have updated their HTML structure - check selectors in script
|
|
|
|
#### Video URL Not Found
|
|
```
|
|
No video URL found for episode X
|
|
```
|
|
|
|
**Causes:**
|
|
- Video is on an unsupported host
|
|
- Page uses non-standard embedding method
|
|
- Anti-bot protection on video player
|
|
|
|
**Solutions:**
|
|
1. Run with `--verbose` to see what URLs are being tried
|
|
2. Open episode manually and check Network tab for video requests
|
|
3. May need to add new iframe selectors for the specific host
|
|
|
|
#### 403 Forbidden on Site
|
|
**Cause:** Site is blocking automated requests
|
|
|
|
**Solutions:**
|
|
1. Ensure you're NOT using `--headless`
|
|
2. Increase random delays
|
|
3. Clear browser cache/cookies (restart script)
|
|
4. Try from a different IP
|
|
|
|
---
|
|
|
|
### Resume Issues
|
|
|
|
#### Resume Not Working
|
|
```
|
|
# Should skip downloaded episodes but re-downloads them
|
|
```
|
|
|
|
**Check:**
|
|
1. Ensure `download_state.json` exists in output directory
|
|
2. Verify the `--resume` flag is being used
|
|
3. Check that episode numbers match between runs
|
|
|
|
#### Corrupt State File
|
|
```
|
|
JSONDecodeError: ...
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Delete the state file to start fresh
|
|
rm /path/to/season/download_state.json
|
|
```
|
|
|
|
---
|
|
|
|
## Debug Mode
|
|
|
|
Run with verbose output:
|
|
```bash
|
|
python pokeflix_scraper.py --url "..." --output ~/Pokemon/ --verbose
|
|
```
|
|
|
|
Run dry-run to test URL extraction:
|
|
```bash
|
|
python pokeflix_scraper.py --url "..." --dry-run --verbose
|
|
```
|
|
|
|
Watch the browser (non-headless):
|
|
```bash
|
|
python pokeflix_scraper.py --url "..." --output ~/Pokemon/
|
|
# (headless is off by default)
|
|
```
|
|
|
|
---
|
|
|
|
## Manual Workarounds
|
|
|
|
### If Automated Extraction Fails
|
|
|
|
1. **Browser DevTools method:**
|
|
- Open episode in browser
|
|
- F12 → Network tab → filter "m3u8" or "mp4"
|
|
- Play video, copy the stream URL
|
|
- Download manually: `yt-dlp "URL"`
|
|
|
|
2. **Check iframe manually:**
|
|
- Right-click video player → Inspect
|
|
- Find `<iframe>` element
|
|
- Copy `src` attribute
|
|
- Use that URL with yt-dlp
|
|
|
|
### Known Video Hosts
|
|
|
|
These hosts are typically supported by yt-dlp:
|
|
- Streamtape
|
|
- Vidoza
|
|
- Mp4upload
|
|
- Doodstream
|
|
- Filemoon
|
|
- Voe.sx
|
|
|
|
If the video is on an unsupported host, check if there's an alternative server/quality option on the episode page.
|