claude-home/media-tools/scripts/CONTEXT.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

112 lines
2.9 KiB
Markdown

---
title: "Media Tools Scripts Reference"
description: "Usage reference for media download scripts including pokeflix_scraper.py CLI options, output structure, state file format, and guide for adding new scrapers."
type: reference
domain: media-tools
tags: [pokeflix, scraper, yt-dlp, playwright, cli, scripts]
---
# Media Tools Scripts
Operational scripts for media downloading and management.
## Scripts
### pokeflix_scraper.py
Downloads Pokemon episodes from pokeflix.tv.
**Dependencies:**
```bash
pip install playwright yt-dlp
playwright install chromium
```
**Quick Start:**
```bash
# Download entire season
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--output ~/Pokemon/
# Download episodes 1-10 only
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--output ~/Pokemon/ \
--start 1 --end 10
# Resume interrupted download
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--output ~/Pokemon/ \
--resume
# Dry run (extract URLs, don't download)
python pokeflix_scraper.py \
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
--dry-run --verbose
```
**CLI Options:**
| Option | Description |
|--------|-------------|
| `--url, -u` | Season page URL (required) |
| `--output, -o` | Output directory (default: ~/Downloads/Pokemon) |
| `--start, -s` | First episode number to download |
| `--end, -e` | Last episode number to download |
| `--resume, -r` | Resume from previous state |
| `--dry-run, -n` | Extract URLs only, no download |
| `--headless` | Run browser without visible window |
| `--verbose, -v` | Enable debug logging |
**Output Structure:**
```
~/Pokemon/
├── Pokemon Indigo League/
│ ├── E01 - Pokemon I Choose You.mp4
│ ├── E02 - Pokemon Emergency.mp4
│ ├── E03 - Ash Catches a Pokemon.mp4
│ └── download_state.json
```
**State File:**
The `download_state.json` tracks progress:
```json
{
"season_url": "https://...",
"season_name": "Pokemon Indigo League",
"episodes": {
"1": {
"number": 1,
"title": "Pokemon I Choose You",
"page_url": "https://...",
"video_url": "https://...",
"downloaded": true,
"error": null
}
},
"last_updated": "2025-01-22T..."
}
```
## Adding New Scrapers
To add a scraper for a new site:
1. Copy the pattern from `pokeflix_scraper.py`
2. Modify the selectors for episode list extraction
3. Modify the iframe/video URL selectors for the new site's player
4. Test with `--dry-run` first
Key methods to customize:
- `get_season_info()` - Extract episode list from season page
- `extract_video_url()` - Get video URL from episode page
## Performance Notes
- **Non-headless mode** is recommended (default) to avoid anti-bot detection
- Random delays (2-5s) between requests prevent rate limiting
- Large seasons (80+ episodes) may take hours - use `--resume` if interrupted