All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2.9 KiB
2.9 KiB
| title | description | type | domain | tags | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Media Tools Scripts Reference | Usage reference for media download scripts including pokeflix_scraper.py CLI options, output structure, state file format, and guide for adding new scrapers. | reference | media-tools |
|
Media Tools Scripts
Operational scripts for media downloading and management.
Scripts
pokeflix_scraper.py
Downloads Pokemon episodes from pokeflix.tv.
Dependencies:
pip install playwright yt-dlp
playwright install chromium
Quick Start:
# Download entire season
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--output ~/Pokemon/
# Download episodes 1-10 only
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--output ~/Pokemon/ \
--start 1 --end 10
# Resume interrupted download
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--output ~/Pokemon/ \
--resume
# Dry run (extract URLs, don't download)
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--dry-run --verbose
CLI Options:
| Option | Description |
|---|---|
--url, -u |
Season page URL (required) |
--output, -o |
Output directory (default: ~/Downloads/Pokemon) |
--start, -s |
First episode number to download |
--end, -e |
Last episode number to download |
--resume, -r |
Resume from previous state |
--dry-run, -n |
Extract URLs only, no download |
--headless |
Run browser without visible window |
--verbose, -v |
Enable debug logging |
Output Structure:
~/Pokemon/
├── Pokemon Indigo League/
│ ├── E01 - Pokemon I Choose You.mp4
│ ├── E02 - Pokemon Emergency.mp4
│ ├── E03 - Ash Catches a Pokemon.mp4
│ └── download_state.json
State File:
The download_state.json tracks progress:
{
"season_url": "https://...",
"season_name": "Pokemon Indigo League",
"episodes": {
"1": {
"number": 1,
"title": "Pokemon I Choose You",
"page_url": "https://...",
"video_url": "https://...",
"downloaded": true,
"error": null
}
},
"last_updated": "2025-01-22T..."
}
Adding New Scrapers
To add a scraper for a new site:
- Copy the pattern from
pokeflix_scraper.py - Modify the selectors for episode list extraction
- Modify the iframe/video URL selectors for the new site's player
- Test with
--dry-runfirst
Key methods to customize:
get_season_info()- Extract episode list from season pageextract_video_url()- Get video URL from episode page
Performance Notes
- Non-headless mode is recommended (default) to avoid anti-bot detection
- Random delays (2-5s) between requests prevent rate limiting
- Large seasons (80+ episodes) may take hours - use
--resumeif interrupted