claude-home/media-tools/CONTEXT.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

2.8 KiB

title description type domain tags
Media Tools Overview Directory overview for media downloading tools using Playwright browser automation and yt-dlp, covering architecture patterns, anti-bot handling, and state management. context media-tools
yt-dlp
playwright
web-scraping
video-download
browser-automation

Media Tools

Tools for downloading and managing media from streaming sites.

Overview

This directory contains utilities for:

  • Extracting video URLs from streaming sites using browser automation
  • Downloading videos via yt-dlp
  • Managing download state for resumable operations

Tools

pokeflix_scraper.py

Downloads Pokemon episodes from pokeflix.tv using Playwright for browser automation.

Location: scripts/pokeflix_scraper.py

Features:

  • Extracts episode lists from season pages
  • Handles iframe-embedded video players (Streamtape, Vidoza, etc.)
  • Resumable downloads with state persistence
  • Configurable episode ranges
  • Dry-run mode for testing

Architecture Pattern

These tools follow a common pattern:

┌─────────────────┐     ┌──────────────────┐     ┌─────────────┐
│  Playwright     │────▶│  Extract embed   │────▶│  yt-dlp     │
│  (navigate)     │     │  video URLs      │     │  (download) │
└─────────────────┘     └──────────────────┘     └─────────────┘

Why this approach:

  1. Playwright handles JavaScript-heavy sites that block simple HTTP requests
  2. Iframe extraction works around sites that use third-party video hosts
  3. yt-dlp is the de-facto standard for video downloading with broad host support

Dependencies

# Python packages
pip install playwright yt-dlp

# Playwright browser installation
playwright install chromium

Common Patterns

Anti-Bot Handling

  • Use headed browser mode (visible window) initially
  • Random delays between requests (2-5 seconds)
  • Realistic viewport and user-agent settings
  • Wait for networkidle state after navigation

State Management

  • JSON state files track downloaded episodes
  • Enable --resume flag to skip completed downloads
  • State includes error information for debugging

Output Organization

{output_dir}/
├── {Season Name}/
│   ├── E01 - Episode Title.mp4
│   ├── E02 - Episode Title.mp4
│   └── download_state.json

When to Use These Tools

  • Downloading entire seasons of shows for offline viewing
  • Archiving content before it becomes unavailable
  • Building a local media library

These tools are for personal archival use. Respect copyright laws in your jurisdiction.