claude-home/media-tools/scripts/CONTEXT.md

---
title: "Media Tools Scripts Reference"
description: "Usage reference for media download scripts including pokeflix_scraper.py CLI options, output structure, state file format, and guide for adding new scrapers."
type: reference
domain: media-tools
tags: [pokeflix, scraper, yt-dlp, playwright, cli, scripts]
---

# Media Tools Scripts

Operational scripts for media downloading and management.

## Scripts

### pokeflix_scraper.py

Downloads Pokemon episodes from pokeflix.tv.

**Dependencies:**
```bash
pip install playwright yt-dlp
playwright install chromium
```

**Quick Start:**
```bash
# Download entire season
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/

# Download episodes 1-10 only
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/ \
    --start 1 --end 10

# Resume interrupted download
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/ \
    --resume

# Dry run (extract URLs, don't download)
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --dry-run --verbose
```

**CLI Options:**

| Option | Description |
|--------|-------------|
| `--url, -u` | Season page URL (required) |
| `--output, -o` | Output directory (default: ~/Downloads/Pokemon) |
| `--start, -s` | First episode number to download |
| `--end, -e` | Last episode number to download |
| `--resume, -r` | Resume from previous state |
| `--dry-run, -n` | Extract URLs only, no download |
| `--headless` | Run browser without visible window |
| `--verbose, -v` | Enable debug logging |

**Output Structure:**
```
~/Pokemon/
├── Pokemon Indigo League/
│   ├── E01 - Pokemon I Choose You.mp4
│   ├── E02 - Pokemon Emergency.mp4
│   ├── E03 - Ash Catches a Pokemon.mp4
│   └── download_state.json
```

**State File:**

The `download_state.json` tracks progress:
```json
{
  "season_url": "https://...",
  "season_name": "Pokemon Indigo League",
  "episodes": {
    "1": {
      "number": 1,
      "title": "Pokemon I Choose You",
      "page_url": "https://...",
      "video_url": "https://...",
      "downloaded": true,
      "error": null
    }
  },
  "last_updated": "2025-01-22T..."
}
```

## Adding New Scrapers

To add a scraper for a new site:

1. Copy the pattern from `pokeflix_scraper.py`
2. Modify the selectors for episode list extraction
3. Modify the iframe/video URL selectors for the new site's player
4. Test with `--dry-run` first

Key methods to customize:
- `get_season_info()` - Extract episode list from season page
- `extract_video_url()` - Get video URL from episode page

## Performance Notes

- **Non-headless mode** is recommended (default) to avoid anti-bot detection
- Random delays (2-5s) between requests prevent rate limiting
- Large seasons (80+ episodes) may take hours - use `--resume` if interrupted