claude-home/media-tools/CONTEXT.md
Cal Corum ceb4dd36a0 Add docker scripts, media-tools, VM management, and n8n workflow docs
Add CONTEXT.md for docker and VM management script directories.
Add media-tools documentation with Playwright scraping patterns.
Add Tdarr GPU monitor n8n workflow definition.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 22:26:10 -06:00

83 lines
2.5 KiB
Markdown

# Media Tools
Tools for downloading and managing media from streaming sites.
## Overview
This directory contains utilities for:
- Extracting video URLs from streaming sites using browser automation
- Downloading videos via yt-dlp
- Managing download state for resumable operations
## Tools
### pokeflix_scraper.py
Downloads Pokemon episodes from pokeflix.tv using Playwright for browser automation.
**Location:** `scripts/pokeflix_scraper.py`
**Features:**
- Extracts episode lists from season pages
- Handles iframe-embedded video players (Streamtape, Vidoza, etc.)
- Resumable downloads with state persistence
- Configurable episode ranges
- Dry-run mode for testing
## Architecture Pattern
These tools follow a common pattern:
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ Playwright │────▶│ Extract embed │────▶│ yt-dlp │
│ (navigate) │ │ video URLs │ │ (download) │
└─────────────────┘ └──────────────────┘ └─────────────┘
```
**Why this approach:**
1. **Playwright** handles JavaScript-heavy sites that block simple HTTP requests
2. **Iframe extraction** works around sites that use third-party video hosts
3. **yt-dlp** is the de-facto standard for video downloading with broad host support
## Dependencies
```bash
# Python packages
pip install playwright yt-dlp
# Playwright browser installation
playwright install chromium
```
## Common Patterns
### Anti-Bot Handling
- Use headed browser mode (visible window) initially
- Random delays between requests (2-5 seconds)
- Realistic viewport and user-agent settings
- Wait for `networkidle` state after navigation
### State Management
- JSON state files track downloaded episodes
- Enable `--resume` flag to skip completed downloads
- State includes error information for debugging
### Output Organization
```
{output_dir}/
├── {Season Name}/
│ ├── E01 - Episode Title.mp4
│ ├── E02 - Episode Title.mp4
│ └── download_state.json
```
## When to Use These Tools
- Downloading entire seasons of shows for offline viewing
- Archiving content before it becomes unavailable
- Building a local media library
## Legal Considerations
These tools are for personal archival use. Respect copyright laws in your jurisdiction.