claude-home/media-tools/scripts/CONTEXT.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

2.9 KiB

title description type domain tags
Media Tools Scripts Reference Usage reference for media download scripts including pokeflix_scraper.py CLI options, output structure, state file format, and guide for adding new scrapers. reference media-tools
pokeflix
scraper
yt-dlp
playwright
cli
scripts

Media Tools Scripts

Operational scripts for media downloading and management.

Scripts

pokeflix_scraper.py

Downloads Pokemon episodes from pokeflix.tv.

Dependencies:

pip install playwright yt-dlp
playwright install chromium

Quick Start:

# Download entire season
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/

# Download episodes 1-10 only
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/ \
    --start 1 --end 10

# Resume interrupted download
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/ \
    --resume

# Dry run (extract URLs, don't download)
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --dry-run --verbose

CLI Options:

Option Description
--url, -u Season page URL (required)
--output, -o Output directory (default: ~/Downloads/Pokemon)
--start, -s First episode number to download
--end, -e Last episode number to download
--resume, -r Resume from previous state
--dry-run, -n Extract URLs only, no download
--headless Run browser without visible window
--verbose, -v Enable debug logging

Output Structure:

~/Pokemon/
├── Pokemon Indigo League/
│   ├── E01 - Pokemon I Choose You.mp4
│   ├── E02 - Pokemon Emergency.mp4
│   ├── E03 - Ash Catches a Pokemon.mp4
│   └── download_state.json

State File:

The download_state.json tracks progress:

{
  "season_url": "https://...",
  "season_name": "Pokemon Indigo League",
  "episodes": {
    "1": {
      "number": 1,
      "title": "Pokemon I Choose You",
      "page_url": "https://...",
      "video_url": "https://...",
      "downloaded": true,
      "error": null
    }
  },
  "last_updated": "2025-01-22T..."
}

Adding New Scrapers

To add a scraper for a new site:

  1. Copy the pattern from pokeflix_scraper.py
  2. Modify the selectors for episode list extraction
  3. Modify the iframe/video URL selectors for the new site's player
  4. Test with --dry-run first

Key methods to customize:

  • get_season_info() - Extract episode list from season page
  • extract_video_url() - Get video URL from episode page

Performance Notes

  • Non-headless mode is recommended (default) to avoid anti-bot detection
  • Random delays (2-5s) between requests prevent rate limiting
  • Large seasons (80+ episodes) may take hours - use --resume if interrupted