All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
112 lines
2.9 KiB
Markdown
112 lines
2.9 KiB
Markdown
---
|
|
title: "Media Tools Scripts Reference"
|
|
description: "Usage reference for media download scripts including pokeflix_scraper.py CLI options, output structure, state file format, and guide for adding new scrapers."
|
|
type: reference
|
|
domain: media-tools
|
|
tags: [pokeflix, scraper, yt-dlp, playwright, cli, scripts]
|
|
---
|
|
|
|
# Media Tools Scripts
|
|
|
|
Operational scripts for media downloading and management.
|
|
|
|
## Scripts
|
|
|
|
### pokeflix_scraper.py
|
|
|
|
Downloads Pokemon episodes from pokeflix.tv.
|
|
|
|
**Dependencies:**
|
|
```bash
|
|
pip install playwright yt-dlp
|
|
playwright install chromium
|
|
```
|
|
|
|
**Quick Start:**
|
|
```bash
|
|
# Download entire season
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--output ~/Pokemon/
|
|
|
|
# Download episodes 1-10 only
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--output ~/Pokemon/ \
|
|
--start 1 --end 10
|
|
|
|
# Resume interrupted download
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--output ~/Pokemon/ \
|
|
--resume
|
|
|
|
# Dry run (extract URLs, don't download)
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--dry-run --verbose
|
|
```
|
|
|
|
**CLI Options:**
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `--url, -u` | Season page URL (required) |
|
|
| `--output, -o` | Output directory (default: ~/Downloads/Pokemon) |
|
|
| `--start, -s` | First episode number to download |
|
|
| `--end, -e` | Last episode number to download |
|
|
| `--resume, -r` | Resume from previous state |
|
|
| `--dry-run, -n` | Extract URLs only, no download |
|
|
| `--headless` | Run browser without visible window |
|
|
| `--verbose, -v` | Enable debug logging |
|
|
|
|
**Output Structure:**
|
|
```
|
|
~/Pokemon/
|
|
├── Pokemon Indigo League/
|
|
│ ├── E01 - Pokemon I Choose You.mp4
|
|
│ ├── E02 - Pokemon Emergency.mp4
|
|
│ ├── E03 - Ash Catches a Pokemon.mp4
|
|
│ └── download_state.json
|
|
```
|
|
|
|
**State File:**
|
|
|
|
The `download_state.json` tracks progress:
|
|
```json
|
|
{
|
|
"season_url": "https://...",
|
|
"season_name": "Pokemon Indigo League",
|
|
"episodes": {
|
|
"1": {
|
|
"number": 1,
|
|
"title": "Pokemon I Choose You",
|
|
"page_url": "https://...",
|
|
"video_url": "https://...",
|
|
"downloaded": true,
|
|
"error": null
|
|
}
|
|
},
|
|
"last_updated": "2025-01-22T..."
|
|
}
|
|
```
|
|
|
|
## Adding New Scrapers
|
|
|
|
To add a scraper for a new site:
|
|
|
|
1. Copy the pattern from `pokeflix_scraper.py`
|
|
2. Modify the selectors for episode list extraction
|
|
3. Modify the iframe/video URL selectors for the new site's player
|
|
4. Test with `--dry-run` first
|
|
|
|
Key methods to customize:
|
|
- `get_season_info()` - Extract episode list from season page
|
|
- `extract_video_url()` - Get video URL from episode page
|
|
|
|
## Performance Notes
|
|
|
|
- **Non-headless mode** is recommended (default) to avoid anti-bot detection
|
|
- Random delays (2-5s) between requests prevent rate limiting
|
|
- Large seasons (80+ episodes) may take hours - use `--resume` if interrupted
|