Add CONTEXT.md for docker and VM management script directories. Add media-tools documentation with Playwright scraping patterns. Add Tdarr GPU monitor n8n workflow definition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
83 lines
2.5 KiB
Markdown
83 lines
2.5 KiB
Markdown
# Media Tools
|
|
|
|
Tools for downloading and managing media from streaming sites.
|
|
|
|
## Overview
|
|
|
|
This directory contains utilities for:
|
|
- Extracting video URLs from streaming sites using browser automation
|
|
- Downloading videos via yt-dlp
|
|
- Managing download state for resumable operations
|
|
|
|
## Tools
|
|
|
|
### pokeflix_scraper.py
|
|
Downloads Pokemon episodes from pokeflix.tv using Playwright for browser automation.
|
|
|
|
**Location:** `scripts/pokeflix_scraper.py`
|
|
|
|
**Features:**
|
|
- Extracts episode lists from season pages
|
|
- Handles iframe-embedded video players (Streamtape, Vidoza, etc.)
|
|
- Resumable downloads with state persistence
|
|
- Configurable episode ranges
|
|
- Dry-run mode for testing
|
|
|
|
## Architecture Pattern
|
|
|
|
These tools follow a common pattern:
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
|
|
│ Playwright │────▶│ Extract embed │────▶│ yt-dlp │
|
|
│ (navigate) │ │ video URLs │ │ (download) │
|
|
└─────────────────┘ └──────────────────┘ └─────────────┘
|
|
```
|
|
|
|
**Why this approach:**
|
|
1. **Playwright** handles JavaScript-heavy sites that block simple HTTP requests
|
|
2. **Iframe extraction** works around sites that use third-party video hosts
|
|
3. **yt-dlp** is the de-facto standard for video downloading with broad host support
|
|
|
|
## Dependencies
|
|
|
|
```bash
|
|
# Python packages
|
|
pip install playwright yt-dlp
|
|
|
|
# Playwright browser installation
|
|
playwright install chromium
|
|
```
|
|
|
|
## Common Patterns
|
|
|
|
### Anti-Bot Handling
|
|
- Use headed browser mode (visible window) initially
|
|
- Random delays between requests (2-5 seconds)
|
|
- Realistic viewport and user-agent settings
|
|
- Wait for `networkidle` state after navigation
|
|
|
|
### State Management
|
|
- JSON state files track downloaded episodes
|
|
- Enable `--resume` flag to skip completed downloads
|
|
- State includes error information for debugging
|
|
|
|
### Output Organization
|
|
```
|
|
{output_dir}/
|
|
├── {Season Name}/
|
|
│ ├── E01 - Episode Title.mp4
|
|
│ ├── E02 - Episode Title.mp4
|
|
│ └── download_state.json
|
|
```
|
|
|
|
## When to Use These Tools
|
|
|
|
- Downloading entire seasons of shows for offline viewing
|
|
- Archiving content before it becomes unavailable
|
|
- Building a local media library
|
|
|
|
## Legal Considerations
|
|
|
|
These tools are for personal archival use. Respect copyright laws in your jurisdiction.
|