--- title: "Media Tools Overview" description: "Directory overview for media downloading tools using Playwright browser automation and yt-dlp, covering architecture patterns, anti-bot handling, and state management." type: context domain: media-tools tags: [yt-dlp, playwright, web-scraping, video-download, browser-automation] --- # Media Tools Tools for downloading and managing media from streaming sites. ## Overview This directory contains utilities for: - Extracting video URLs from streaming sites using browser automation - Downloading videos via yt-dlp - Managing download state for resumable operations ## Tools ### pokeflix_scraper.py Downloads Pokemon episodes from pokeflix.tv using Playwright for browser automation. **Location:** `scripts/pokeflix_scraper.py` **Features:** - Extracts episode lists from season pages - Handles iframe-embedded video players (Streamtape, Vidoza, etc.) - Resumable downloads with state persistence - Configurable episode ranges - Dry-run mode for testing ## Architecture Pattern These tools follow a common pattern: ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐ │ Playwright │────▶│ Extract embed │────▶│ yt-dlp │ │ (navigate) │ │ video URLs │ │ (download) │ └─────────────────┘ └──────────────────┘ └─────────────┘ ``` **Why this approach:** 1. **Playwright** handles JavaScript-heavy sites that block simple HTTP requests 2. **Iframe extraction** works around sites that use third-party video hosts 3. **yt-dlp** is the de-facto standard for video downloading with broad host support ## Dependencies ```bash # Python packages pip install playwright yt-dlp # Playwright browser installation playwright install chromium ``` ## Common Patterns ### Anti-Bot Handling - Use headed browser mode (visible window) initially - Random delays between requests (2-5 seconds) - Realistic viewport and user-agent settings - Wait for `networkidle` state after navigation ### State Management - JSON state files track downloaded episodes - Enable `--resume` flag to skip completed downloads - State includes error information for debugging ### Output Organization ``` {output_dir}/ ├── {Season Name}/ │ ├── E01 - Episode Title.mp4 │ ├── E02 - Episode Title.mp4 │ └── download_state.json ``` ## When to Use These Tools - Downloading entire seasons of shows for offline viewing - Archiving content before it becomes unavailable - Building a local media library ## Legal Considerations These tools are for personal archival use. Respect copyright laws in your jurisdiction.