Add CONTEXT.md for docker and VM management script directories. Add media-tools documentation with Playwright scraping patterns. Add Tdarr GPU monitor n8n workflow definition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
104 lines
2.6 KiB
Markdown
104 lines
2.6 KiB
Markdown
# Media Tools Scripts
|
|
|
|
Operational scripts for media downloading and management.
|
|
|
|
## Scripts
|
|
|
|
### pokeflix_scraper.py
|
|
|
|
Downloads Pokemon episodes from pokeflix.tv.
|
|
|
|
**Dependencies:**
|
|
```bash
|
|
pip install playwright yt-dlp
|
|
playwright install chromium
|
|
```
|
|
|
|
**Quick Start:**
|
|
```bash
|
|
# Download entire season
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--output ~/Pokemon/
|
|
|
|
# Download episodes 1-10 only
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--output ~/Pokemon/ \
|
|
--start 1 --end 10
|
|
|
|
# Resume interrupted download
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--output ~/Pokemon/ \
|
|
--resume
|
|
|
|
# Dry run (extract URLs, don't download)
|
|
python pokeflix_scraper.py \
|
|
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
|
--dry-run --verbose
|
|
```
|
|
|
|
**CLI Options:**
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `--url, -u` | Season page URL (required) |
|
|
| `--output, -o` | Output directory (default: ~/Downloads/Pokemon) |
|
|
| `--start, -s` | First episode number to download |
|
|
| `--end, -e` | Last episode number to download |
|
|
| `--resume, -r` | Resume from previous state |
|
|
| `--dry-run, -n` | Extract URLs only, no download |
|
|
| `--headless` | Run browser without visible window |
|
|
| `--verbose, -v` | Enable debug logging |
|
|
|
|
**Output Structure:**
|
|
```
|
|
~/Pokemon/
|
|
├── Pokemon Indigo League/
|
|
│ ├── E01 - Pokemon I Choose You.mp4
|
|
│ ├── E02 - Pokemon Emergency.mp4
|
|
│ ├── E03 - Ash Catches a Pokemon.mp4
|
|
│ └── download_state.json
|
|
```
|
|
|
|
**State File:**
|
|
|
|
The `download_state.json` tracks progress:
|
|
```json
|
|
{
|
|
"season_url": "https://...",
|
|
"season_name": "Pokemon Indigo League",
|
|
"episodes": {
|
|
"1": {
|
|
"number": 1,
|
|
"title": "Pokemon I Choose You",
|
|
"page_url": "https://...",
|
|
"video_url": "https://...",
|
|
"downloaded": true,
|
|
"error": null
|
|
}
|
|
},
|
|
"last_updated": "2025-01-22T..."
|
|
}
|
|
```
|
|
|
|
## Adding New Scrapers
|
|
|
|
To add a scraper for a new site:
|
|
|
|
1. Copy the pattern from `pokeflix_scraper.py`
|
|
2. Modify the selectors for episode list extraction
|
|
3. Modify the iframe/video URL selectors for the new site's player
|
|
4. Test with `--dry-run` first
|
|
|
|
Key methods to customize:
|
|
- `get_season_info()` - Extract episode list from season page
|
|
- `extract_video_url()` - Get video URL from episode page
|
|
|
|
## Performance Notes
|
|
|
|
- **Non-headless mode** is recommended (default) to avoid anti-bot detection
|
|
- Random delays (2-5s) between requests prevent rate limiting
|
|
- Large seasons (80+ episodes) may take hours - use `--resume` if interrupted
|