Add docker scripts, media-tools, VM management, and n8n workflow docs
Add CONTEXT.md for docker and VM management script directories. Add media-tools documentation with Playwright scraping patterns. Add Tdarr GPU monitor n8n workflow definition. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
b186107b97
commit
ceb4dd36a0
92
docker/scripts/CONTEXT.md
Normal file
92
docker/scripts/CONTEXT.md
Normal file
@ -0,0 +1,92 @@
|
||||
# Docker Scripts - Operational Context
|
||||
|
||||
## Script Overview
|
||||
This directory is reserved for active operational scripts for Docker container management, orchestration, and automation.
|
||||
|
||||
**Current Status**: No operational scripts currently deployed. This structure is maintained for future Docker automation needs.
|
||||
|
||||
## Future Script Categories
|
||||
|
||||
### Planned Script Types
|
||||
|
||||
**Container Lifecycle Management**
|
||||
- Start/stop scripts for complex multi-container setups
|
||||
- Health check and restart automation
|
||||
- Graceful shutdown procedures for dependent containers
|
||||
|
||||
**Maintenance Automation**
|
||||
- Image cleanup and pruning scripts
|
||||
- Volume backup and restoration
|
||||
- Container log rotation and archiving
|
||||
- Network cleanup and validation
|
||||
|
||||
**Monitoring and Alerts**
|
||||
- Container health monitoring
|
||||
- Resource usage tracking
|
||||
- Discord/webhook notifications for container events
|
||||
- Uptime and availability reporting
|
||||
|
||||
**Deployment Automation**
|
||||
- CI/CD integration scripts
|
||||
- Rolling update procedures
|
||||
- Blue-green deployment automation
|
||||
- Container migration tools
|
||||
|
||||
## Integration Points
|
||||
|
||||
### External Dependencies
|
||||
- **Docker/Podman**: Container runtime
|
||||
- **Docker Compose**: Multi-container orchestration
|
||||
- **cron**: System scheduler for automation
|
||||
- **Discord Webhooks**: Notification integration (when implemented)
|
||||
|
||||
### File System Dependencies
|
||||
- **Container Volumes**: Various locations depending on service
|
||||
- **Configuration Files**: Service-specific docker-compose.yml files
|
||||
- **Log Files**: Container and automation logs
|
||||
- **Backup Storage**: For volume snapshots and exports
|
||||
|
||||
### Network Dependencies
|
||||
- **Docker Networks**: Bridge, host, and custom networks
|
||||
- **External Services**: APIs and webhooks for monitoring
|
||||
- **Registry Access**: For image pulls and pushes (when needed)
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
### When Adding New Scripts
|
||||
|
||||
**Documentation Requirements**:
|
||||
1. Add script description to this CONTEXT.md under appropriate category
|
||||
2. Include usage examples and command-line options
|
||||
3. Document dependencies and prerequisites
|
||||
4. Specify cron schedule if automated
|
||||
5. Add troubleshooting section for common issues
|
||||
|
||||
**Script Standards**:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Script name and purpose
|
||||
# Dependencies: list required commands/services
|
||||
# Usage: ./script.sh [options]
|
||||
|
||||
set -euo pipefail # Strict error handling
|
||||
```
|
||||
|
||||
**Testing Requirements**:
|
||||
- Test with both Docker and Podman where applicable
|
||||
- Verify error handling and logging
|
||||
- Document failure modes and recovery procedures
|
||||
- Include dry-run or test mode where appropriate
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Technology Overview**: `/docker/CONTEXT.md`
|
||||
- **Troubleshooting**: `/docker/troubleshooting.md`
|
||||
- **Examples**: `/docker/examples/` - Reference configurations and patterns
|
||||
- **Main Instructions**: `/CLAUDE.md` - Context loading rules
|
||||
|
||||
## Notes
|
||||
|
||||
This directory structure is maintained to support future Docker automation needs while keeping operational scripts organized and documented according to the technology-first documentation pattern established in the claude-home repository.
|
||||
|
||||
When scripts are added, this file should be updated to include specific operational context similar to the comprehensive documentation found in `/tdarr/scripts/CONTEXT.md`.
|
||||
82
media-tools/CONTEXT.md
Normal file
82
media-tools/CONTEXT.md
Normal file
@ -0,0 +1,82 @@
|
||||
# Media Tools
|
||||
|
||||
Tools for downloading and managing media from streaming sites.
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains utilities for:
|
||||
- Extracting video URLs from streaming sites using browser automation
|
||||
- Downloading videos via yt-dlp
|
||||
- Managing download state for resumable operations
|
||||
|
||||
## Tools
|
||||
|
||||
### pokeflix_scraper.py
|
||||
Downloads Pokemon episodes from pokeflix.tv using Playwright for browser automation.
|
||||
|
||||
**Location:** `scripts/pokeflix_scraper.py`
|
||||
|
||||
**Features:**
|
||||
- Extracts episode lists from season pages
|
||||
- Handles iframe-embedded video players (Streamtape, Vidoza, etc.)
|
||||
- Resumable downloads with state persistence
|
||||
- Configurable episode ranges
|
||||
- Dry-run mode for testing
|
||||
|
||||
## Architecture Pattern
|
||||
|
||||
These tools follow a common pattern:
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
|
||||
│ Playwright │────▶│ Extract embed │────▶│ yt-dlp │
|
||||
│ (navigate) │ │ video URLs │ │ (download) │
|
||||
└─────────────────┘ └──────────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
**Why this approach:**
|
||||
1. **Playwright** handles JavaScript-heavy sites that block simple HTTP requests
|
||||
2. **Iframe extraction** works around sites that use third-party video hosts
|
||||
3. **yt-dlp** is the de-facto standard for video downloading with broad host support
|
||||
|
||||
## Dependencies
|
||||
|
||||
```bash
|
||||
# Python packages
|
||||
pip install playwright yt-dlp
|
||||
|
||||
# Playwright browser installation
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Anti-Bot Handling
|
||||
- Use headed browser mode (visible window) initially
|
||||
- Random delays between requests (2-5 seconds)
|
||||
- Realistic viewport and user-agent settings
|
||||
- Wait for `networkidle` state after navigation
|
||||
|
||||
### State Management
|
||||
- JSON state files track downloaded episodes
|
||||
- Enable `--resume` flag to skip completed downloads
|
||||
- State includes error information for debugging
|
||||
|
||||
### Output Organization
|
||||
```
|
||||
{output_dir}/
|
||||
├── {Season Name}/
|
||||
│ ├── E01 - Episode Title.mp4
|
||||
│ ├── E02 - Episode Title.mp4
|
||||
│ └── download_state.json
|
||||
```
|
||||
|
||||
## When to Use These Tools
|
||||
|
||||
- Downloading entire seasons of shows for offline viewing
|
||||
- Archiving content before it becomes unavailable
|
||||
- Building a local media library
|
||||
|
||||
## Legal Considerations
|
||||
|
||||
These tools are for personal archival use. Respect copyright laws in your jurisdiction.
|
||||
103
media-tools/scripts/CONTEXT.md
Normal file
103
media-tools/scripts/CONTEXT.md
Normal file
@ -0,0 +1,103 @@
|
||||
# Media Tools Scripts
|
||||
|
||||
Operational scripts for media downloading and management.
|
||||
|
||||
## Scripts
|
||||
|
||||
### pokeflix_scraper.py
|
||||
|
||||
Downloads Pokemon episodes from pokeflix.tv.
|
||||
|
||||
**Dependencies:**
|
||||
```bash
|
||||
pip install playwright yt-dlp
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
**Quick Start:**
|
||||
```bash
|
||||
# Download entire season
|
||||
python pokeflix_scraper.py \
|
||||
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
||||
--output ~/Pokemon/
|
||||
|
||||
# Download episodes 1-10 only
|
||||
python pokeflix_scraper.py \
|
||||
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
||||
--output ~/Pokemon/ \
|
||||
--start 1 --end 10
|
||||
|
||||
# Resume interrupted download
|
||||
python pokeflix_scraper.py \
|
||||
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
||||
--output ~/Pokemon/ \
|
||||
--resume
|
||||
|
||||
# Dry run (extract URLs, don't download)
|
||||
python pokeflix_scraper.py \
|
||||
--url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
|
||||
--dry-run --verbose
|
||||
```
|
||||
|
||||
**CLI Options:**
|
||||
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--url, -u` | Season page URL (required) |
|
||||
| `--output, -o` | Output directory (default: ~/Downloads/Pokemon) |
|
||||
| `--start, -s` | First episode number to download |
|
||||
| `--end, -e` | Last episode number to download |
|
||||
| `--resume, -r` | Resume from previous state |
|
||||
| `--dry-run, -n` | Extract URLs only, no download |
|
||||
| `--headless` | Run browser without visible window |
|
||||
| `--verbose, -v` | Enable debug logging |
|
||||
|
||||
**Output Structure:**
|
||||
```
|
||||
~/Pokemon/
|
||||
├── Pokemon Indigo League/
|
||||
│ ├── E01 - Pokemon I Choose You.mp4
|
||||
│ ├── E02 - Pokemon Emergency.mp4
|
||||
│ ├── E03 - Ash Catches a Pokemon.mp4
|
||||
│ └── download_state.json
|
||||
```
|
||||
|
||||
**State File:**
|
||||
|
||||
The `download_state.json` tracks progress:
|
||||
```json
|
||||
{
|
||||
"season_url": "https://...",
|
||||
"season_name": "Pokemon Indigo League",
|
||||
"episodes": {
|
||||
"1": {
|
||||
"number": 1,
|
||||
"title": "Pokemon I Choose You",
|
||||
"page_url": "https://...",
|
||||
"video_url": "https://...",
|
||||
"downloaded": true,
|
||||
"error": null
|
||||
}
|
||||
},
|
||||
"last_updated": "2025-01-22T..."
|
||||
}
|
||||
```
|
||||
|
||||
## Adding New Scrapers
|
||||
|
||||
To add a scraper for a new site:
|
||||
|
||||
1. Copy the pattern from `pokeflix_scraper.py`
|
||||
2. Modify the selectors for episode list extraction
|
||||
3. Modify the iframe/video URL selectors for the new site's player
|
||||
4. Test with `--dry-run` first
|
||||
|
||||
Key methods to customize:
|
||||
- `get_season_info()` - Extract episode list from season page
|
||||
- `extract_video_url()` - Get video URL from episode page
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- **Non-headless mode** is recommended (default) to avoid anti-bot detection
|
||||
- Random delays (2-5s) between requests prevent rate limiting
|
||||
- Large seasons (80+ episodes) may take hours - use `--resume` if interrupted
|
||||
777
media-tools/scripts/pokeflix_scraper.py
Executable file
777
media-tools/scripts/pokeflix_scraper.py
Executable file
@ -0,0 +1,777 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Pokeflix Scraper - Download Pokemon episodes from pokeflix.tv
|
||||
|
||||
Pokeflix hosts videos directly on their CDN (v1.pkflx.com). This scraper:
|
||||
1. Extracts the episode list from a season browse page
|
||||
2. Visits each episode page to detect its CDN episode number
|
||||
3. Downloads videos directly from the CDN via yt-dlp
|
||||
|
||||
Usage:
|
||||
# Download entire season
|
||||
python pokeflix_scraper.py --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" --output ~/Pokemon/
|
||||
|
||||
# Download specific episode range
|
||||
python pokeflix_scraper.py --url "..." --start 1 --end 10 --output ~/Pokemon/
|
||||
|
||||
# Resume interrupted download
|
||||
python pokeflix_scraper.py --url "..." --output ~/Pokemon/ --resume
|
||||
|
||||
# Dry run (extract URLs only)
|
||||
python pokeflix_scraper.py --url "..." --dry-run
|
||||
|
||||
# Choose quality
|
||||
python pokeflix_scraper.py --url "..." --quality 720p --output ~/Pokemon/
|
||||
|
||||
Dependencies:
|
||||
pip install playwright
|
||||
playwright install chromium
|
||||
# yt-dlp must be installed: pip install yt-dlp
|
||||
|
||||
Author: Cal Corum (with Jarvis assistance)
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import random
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
from dataclasses import dataclass, field, asdict
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
try:
|
||||
from playwright.async_api import async_playwright, Page, Browser
|
||||
except ImportError:
|
||||
print("ERROR: playwright not installed. Run: pip install playwright && playwright install chromium")
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Data Classes
|
||||
# ============================================================================
|
||||
|
||||
@dataclass
|
||||
class Episode:
|
||||
"""Represents a single episode with its metadata and download status."""
|
||||
cdn_number: int # The actual episode number on the CDN
|
||||
title: str
|
||||
page_url: str
|
||||
slug: str # URL slug e.g., "01-pokemon-i-choose-you"
|
||||
video_url: Optional[str] = None
|
||||
downloaded: bool = False
|
||||
error: Optional[str] = None
|
||||
|
||||
@property
|
||||
def filename(self) -> str:
|
||||
"""Generate safe filename for the episode."""
|
||||
safe_title = re.sub(r'[<>:"/\\|?*]', '', self.title)
|
||||
safe_title = safe_title.strip()
|
||||
return f"E{self.cdn_number:02d} - {safe_title}.mp4"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Season:
|
||||
"""Represents a season/series with all its episodes."""
|
||||
name: str
|
||||
url: str
|
||||
cdn_slug: str # e.g., "01-indigo-league" - used for CDN URLs
|
||||
episodes: list[Episode] = field(default_factory=list)
|
||||
|
||||
@property
|
||||
def safe_name(self) -> str:
|
||||
"""Generate safe directory name for the season."""
|
||||
safe = re.sub(r'[<>:"/\\|?*]', '', self.name)
|
||||
return safe.strip()
|
||||
|
||||
|
||||
@dataclass
|
||||
class DownloadState:
|
||||
"""Persistent state for resumable downloads."""
|
||||
season_url: str
|
||||
season_name: str
|
||||
cdn_slug: str
|
||||
episodes: dict[int, dict] = field(default_factory=dict) # cdn_number -> episode dict
|
||||
episode_urls: list[str] = field(default_factory=list) # All episode page URLs
|
||||
last_updated: str = ""
|
||||
|
||||
def save(self, path: Path) -> None:
|
||||
"""Save state to JSON file."""
|
||||
self.last_updated = datetime.now().isoformat()
|
||||
with open(path, 'w') as f:
|
||||
json.dump(asdict(self), f, indent=2)
|
||||
|
||||
@classmethod
|
||||
def load(cls, path: Path) -> Optional['DownloadState']:
|
||||
"""Load state from JSON file."""
|
||||
if not path.exists():
|
||||
return None
|
||||
with open(path) as f:
|
||||
data = json.load(f)
|
||||
return cls(**data)
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Logging Setup
|
||||
# ============================================================================
|
||||
|
||||
def setup_logging(verbose: bool = False) -> logging.Logger:
|
||||
"""Configure logging with console output."""
|
||||
logger = logging.getLogger('pokeflix_scraper')
|
||||
logger.setLevel(logging.DEBUG if verbose else logging.INFO)
|
||||
|
||||
if not logger.handlers:
|
||||
console = logging.StreamHandler()
|
||||
console.setLevel(logging.DEBUG if verbose else logging.INFO)
|
||||
console.setFormatter(logging.Formatter(
|
||||
'%(asctime)s [%(levelname)s] %(message)s',
|
||||
datefmt='%H:%M:%S'
|
||||
))
|
||||
logger.addHandler(console)
|
||||
|
||||
return logger
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Scraper Class
|
||||
# ============================================================================
|
||||
|
||||
class PokeflixScraper:
|
||||
"""
|
||||
Scrapes pokeflix.tv for video URLs and downloads them.
|
||||
|
||||
Pokeflix hosts videos on their CDN with URLs like:
|
||||
https://v1.pkflx.com/hls/{season-slug}/{ep-num}/{ep-num}_{quality}.mp4
|
||||
|
||||
The episode number must be detected by visiting each episode page,
|
||||
as the browse page URL slugs don't contain episode numbers.
|
||||
"""
|
||||
|
||||
BASE_URL = "https://www.pokeflix.tv"
|
||||
CDN_URL = "https://v1.pkflx.com/hls"
|
||||
|
||||
# Map browse page URL slugs to CDN slugs
|
||||
SEASON_SLUG_MAP = {
|
||||
'pokemon-indigo-league': '01-indigo-league',
|
||||
'pokemon-adventures-in-the-orange-islands': '02-orange-islands',
|
||||
'pokemon-the-johto-journeys': '03-johto-journeys',
|
||||
'pokemon-johto-league-champions': '04-johto-league-champions',
|
||||
'pokemon-master-quest': '05-master-quest',
|
||||
'pokemon-advanced': '06-advanced',
|
||||
'pokemon-advanced-challenge': '07-advanced-challenge',
|
||||
'pokemon-advanced-battle': '08-advanced-battle',
|
||||
'pokemon-battle-frontier': '09-battle-frontier',
|
||||
'pokemon-diamond-and-pearl': '10-diamond-and-pearl',
|
||||
'pokemon-dp-battle-dimension': '11-battle-dimension',
|
||||
'pokemon-dp-galactic-battles': '12-galactic-battles',
|
||||
'pokemon-dp-sinnoh-league-victors': '13-sinnoh-league-victors',
|
||||
'pokemon-black-white': '14-black-and-white',
|
||||
'pokemon-bw-rival-destinies': '15-rival-destinies',
|
||||
'pokemon-bw-adventures-in-unova': '16-adventures-in-unova',
|
||||
'pokemon-xy': '17-xy',
|
||||
'pokemon-xy-kalos-quest': '18-kalos-quest',
|
||||
'pokemon-xyz': '19-xyz',
|
||||
'pokemon-sun-moon': '20-sun-and-moon',
|
||||
'pokemon-sun-moon-ultra-adventures': '21-ultra-adventures',
|
||||
'pokemon-sun-moon-ultra-legends': '22-ultra-legends',
|
||||
'pokemon-journeys': '23-journeys',
|
||||
'pokemon-master-journeys': '24-master-journeys',
|
||||
'pokemon-ultimate-journeys': '25-ultimate-journeys',
|
||||
'pokemon-horizons': '26-horizons',
|
||||
}
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
output_dir: Path,
|
||||
headless: bool = False,
|
||||
dry_run: bool = False,
|
||||
verbose: bool = False,
|
||||
quality: str = "1080p"
|
||||
):
|
||||
self.output_dir = output_dir
|
||||
self.headless = headless
|
||||
self.dry_run = dry_run
|
||||
self.quality = quality
|
||||
self.logger = setup_logging(verbose)
|
||||
self.browser: Optional[Browser] = None
|
||||
self._context = None
|
||||
|
||||
async def __aenter__(self):
|
||||
"""Async context manager entry - launch browser."""
|
||||
playwright = await async_playwright().start()
|
||||
self.browser = await playwright.chromium.launch(
|
||||
headless=self.headless,
|
||||
args=['--disable-blink-features=AutomationControlled']
|
||||
)
|
||||
self._playwright = playwright
|
||||
# Create a persistent context to reuse
|
||||
self._context = await self.browser.new_context(
|
||||
viewport={'width': 1920, 'height': 1080},
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
)
|
||||
return self
|
||||
|
||||
async def __aexit__(self, exc_type, exc_val, exc_tb):
|
||||
"""Async context manager exit - close browser."""
|
||||
if self._context:
|
||||
await self._context.close()
|
||||
if self.browser:
|
||||
await self.browser.close()
|
||||
await self._playwright.stop()
|
||||
|
||||
async def _new_page(self) -> Page:
|
||||
"""Create a new page using the shared context."""
|
||||
return await self._context.new_page()
|
||||
|
||||
async def _random_delay(self, min_sec: float = 1.0, max_sec: float = 3.0):
|
||||
"""Random delay to avoid detection."""
|
||||
delay = random.uniform(min_sec, max_sec)
|
||||
await asyncio.sleep(delay)
|
||||
|
||||
async def _wait_for_cloudflare(self, page: Page, timeout: int = 60):
|
||||
"""Wait for Cloudflare challenge to be solved by user."""
|
||||
try:
|
||||
# Check if we're on a Cloudflare challenge page
|
||||
is_cf = await page.query_selector('#challenge-running, .cf-browser-verification, [id*="challenge"]')
|
||||
if is_cf:
|
||||
self.logger.warning("Cloudflare challenge detected - please solve it in the browser window")
|
||||
self.logger.info("Waiting up to 60 seconds for challenge completion...")
|
||||
|
||||
# Wait for the challenge to be solved (URL changes or challenge element disappears)
|
||||
for _ in range(timeout):
|
||||
await asyncio.sleep(1)
|
||||
is_cf = await page.query_selector('#challenge-running, .cf-browser-verification, [id*="challenge"]')
|
||||
if not is_cf:
|
||||
self.logger.info("Cloudflare challenge completed!")
|
||||
await asyncio.sleep(2) # Wait for page to fully load
|
||||
return True
|
||||
|
||||
self.logger.error("Cloudflare challenge timeout - please try again")
|
||||
return False
|
||||
except Exception:
|
||||
pass
|
||||
return True
|
||||
|
||||
def _get_cdn_slug(self, browse_url: str) -> Optional[str]:
|
||||
"""Extract CDN slug from browse page URL."""
|
||||
match = re.search(r'/browse/([^/]+)', browse_url)
|
||||
if match:
|
||||
page_slug = match.group(1)
|
||||
if page_slug in self.SEASON_SLUG_MAP:
|
||||
return self.SEASON_SLUG_MAP[page_slug]
|
||||
self.logger.warning(f"Unknown season slug: {page_slug}, will try to detect from page")
|
||||
return None
|
||||
|
||||
def _construct_video_url(self, cdn_slug: str, ep_num: int) -> str:
|
||||
"""Construct direct CDN video URL."""
|
||||
return f"{self.CDN_URL}/{cdn_slug}/{ep_num:02d}/{ep_num:02d}_{self.quality}.mp4"
|
||||
|
||||
def _slug_to_title(self, slug: str) -> str:
|
||||
"""Convert URL slug to human-readable title."""
|
||||
# Remove season prefix like "01-"
|
||||
title_slug = re.sub(r'^\d+-', '', slug)
|
||||
# Convert to title case
|
||||
title = title_slug.replace('-', ' ').title()
|
||||
# Clean up common words
|
||||
title = re.sub(r'\bPokemon\b', 'Pokémon', title)
|
||||
return title
|
||||
|
||||
async def get_episode_list(self, season_url: str) -> tuple[str, str, list[tuple[str, str]]]:
|
||||
"""
|
||||
Get the list of episode URLs from a season browse page.
|
||||
|
||||
Returns:
|
||||
Tuple of (season_name, cdn_slug, list of (page_url, slug) tuples)
|
||||
"""
|
||||
self.logger.info(f"Fetching season page: {season_url}")
|
||||
|
||||
cdn_slug = self._get_cdn_slug(season_url)
|
||||
|
||||
page = await self._new_page()
|
||||
try:
|
||||
await page.goto(season_url, wait_until='networkidle', timeout=60000)
|
||||
await self._wait_for_cloudflare(page)
|
||||
await self._random_delay(2, 4)
|
||||
|
||||
# Extract season title
|
||||
title_elem = await page.query_selector('h1, .season-title, .series-title')
|
||||
if not title_elem:
|
||||
title_elem = await page.query_selector('title')
|
||||
season_name = await title_elem.inner_text() if title_elem else "Unknown Season"
|
||||
season_name = season_name.replace('Pokéflix - Watch ', '').replace(' for free online!', '').strip()
|
||||
|
||||
self.logger.info(f"Season: {season_name}")
|
||||
|
||||
# Find all episode links with /v/ pattern
|
||||
links = await page.query_selector_all('a[href^="/v/"]')
|
||||
self.logger.info(f"Found {len(links)} episode links")
|
||||
|
||||
# If we don't have CDN slug, detect it from first episode
|
||||
if not cdn_slug and links:
|
||||
first_href = await links[0].get_attribute('href')
|
||||
cdn_slug = await self._detect_cdn_slug(first_href)
|
||||
|
||||
if not cdn_slug:
|
||||
self.logger.error("Could not determine CDN slug for this season")
|
||||
return season_name, "unknown", []
|
||||
|
||||
self.logger.info(f"CDN slug: {cdn_slug}")
|
||||
|
||||
# Collect all episode URLs
|
||||
episode_data = []
|
||||
seen_urls = set()
|
||||
|
||||
for link in links:
|
||||
href = await link.get_attribute('href')
|
||||
if not href or href in seen_urls:
|
||||
continue
|
||||
seen_urls.add(href)
|
||||
|
||||
# Extract slug from URL
|
||||
slug_match = re.search(r'/v/(.+)', href)
|
||||
if slug_match:
|
||||
slug = slug_match.group(1)
|
||||
full_url = self.BASE_URL + href
|
||||
episode_data.append((full_url, slug))
|
||||
|
||||
return season_name, cdn_slug, episode_data
|
||||
|
||||
finally:
|
||||
await page.close()
|
||||
|
||||
async def _detect_cdn_slug(self, episode_href: str) -> Optional[str]:
|
||||
"""Visit an episode page to detect the CDN slug from network requests."""
|
||||
self.logger.info("Detecting CDN slug from episode page...")
|
||||
|
||||
detected_slug = None
|
||||
|
||||
async def capture_request(request):
|
||||
nonlocal detected_slug
|
||||
if 'v1.pkflx.com/hls/' in request.url:
|
||||
match = re.search(r'hls/([^/]+)/', request.url)
|
||||
if match:
|
||||
detected_slug = match.group(1)
|
||||
|
||||
page = await self._new_page()
|
||||
page.on('request', capture_request)
|
||||
|
||||
try:
|
||||
await page.goto(self.BASE_URL + episode_href, timeout=60000)
|
||||
await self._wait_for_cloudflare(page)
|
||||
await asyncio.sleep(5)
|
||||
return detected_slug
|
||||
finally:
|
||||
await page.close()
|
||||
|
||||
async def get_episode_cdn_number(self, page_url: str) -> Optional[int]:
|
||||
"""
|
||||
Visit an episode page and detect its CDN episode number.
|
||||
|
||||
Returns:
|
||||
The episode number used in CDN URLs, or None if not detected
|
||||
"""
|
||||
detected_num = None
|
||||
|
||||
async def capture_request(request):
|
||||
nonlocal detected_num
|
||||
if 'v1.pkflx.com/hls/' in request.url:
|
||||
match = re.search(r'/(\d+)/\d+_', request.url)
|
||||
if match:
|
||||
detected_num = int(match.group(1))
|
||||
|
||||
page = await self._new_page()
|
||||
page.on('request', capture_request)
|
||||
|
||||
try:
|
||||
await page.goto(page_url, timeout=60000)
|
||||
await self._wait_for_cloudflare(page)
|
||||
|
||||
# Wait for initial load
|
||||
await asyncio.sleep(2)
|
||||
|
||||
# Try to trigger video playback by clicking play button or video area
|
||||
play_selectors = [
|
||||
'button[aria-label*="play" i]',
|
||||
'.play-button',
|
||||
'[class*="play"]',
|
||||
'video',
|
||||
'.video-player',
|
||||
'.player',
|
||||
'#player',
|
||||
]
|
||||
|
||||
for selector in play_selectors:
|
||||
try:
|
||||
elem = await page.query_selector(selector)
|
||||
if elem:
|
||||
await elem.click()
|
||||
await asyncio.sleep(0.5)
|
||||
if detected_num:
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Wait for video requests after click attempts
|
||||
for _ in range(10): # Wait up to 5 seconds
|
||||
if detected_num:
|
||||
break
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
# If still not detected, try looking in page source
|
||||
if not detected_num:
|
||||
content = await page.content()
|
||||
match = re.search(r'v1\.pkflx\.com/hls/[^/]+/(\d+)/', content)
|
||||
if match:
|
||||
detected_num = int(match.group(1))
|
||||
|
||||
return detected_num
|
||||
finally:
|
||||
await page.close()
|
||||
|
||||
def download_video(self, video_url: str, output_path: Path) -> bool:
|
||||
"""
|
||||
Download video using yt-dlp.
|
||||
|
||||
Args:
|
||||
video_url: Direct CDN URL to the video
|
||||
output_path: Full path for output file
|
||||
|
||||
Returns:
|
||||
True if download succeeded
|
||||
"""
|
||||
if self.dry_run:
|
||||
self.logger.info(f" [DRY RUN] Would download: {video_url}")
|
||||
self.logger.info(f" To: {output_path}")
|
||||
return True
|
||||
|
||||
self.logger.info(f" Downloading: {output_path.name}")
|
||||
|
||||
cmd = [
|
||||
'yt-dlp',
|
||||
'--no-warnings',
|
||||
'-o', str(output_path),
|
||||
'--no-part',
|
||||
video_url
|
||||
]
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=1800
|
||||
)
|
||||
|
||||
if result.returncode == 0:
|
||||
self.logger.info(f" Download complete!")
|
||||
return True
|
||||
else:
|
||||
self.logger.error(f" yt-dlp error: {result.stderr}")
|
||||
return False
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
self.logger.error(" Download timed out after 30 minutes")
|
||||
return False
|
||||
except FileNotFoundError:
|
||||
self.logger.error(" yt-dlp not found. Install with: pip install yt-dlp")
|
||||
return False
|
||||
|
||||
async def download_direct(
|
||||
self,
|
||||
season_url: str,
|
||||
start_ep: int,
|
||||
end_ep: int,
|
||||
resume: bool = False
|
||||
) -> None:
|
||||
"""
|
||||
Direct download mode - download episodes by number without visiting pages.
|
||||
|
||||
This is faster and more reliable when you know the episode range.
|
||||
"""
|
||||
# Get CDN slug from URL
|
||||
cdn_slug = self._get_cdn_slug(season_url)
|
||||
if not cdn_slug:
|
||||
self.logger.error("Unknown season - direct mode requires a known season URL")
|
||||
self.logger.info("Known seasons: " + ", ".join(self.SEASON_SLUG_MAP.keys()))
|
||||
return
|
||||
|
||||
# In direct mode, output_dir is the final destination (no subfolder created)
|
||||
season_dir = self.output_dir
|
||||
season_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.logger.info(f"Direct download mode: Episodes {start_ep}-{end_ep}")
|
||||
self.logger.info(f"CDN slug: {cdn_slug}")
|
||||
self.logger.info(f"Quality: {self.quality}")
|
||||
self.logger.info(f"Output: {season_dir}")
|
||||
|
||||
downloaded = 0
|
||||
skipped = 0
|
||||
failed = 0
|
||||
|
||||
for ep_num in range(start_ep, end_ep + 1):
|
||||
output_path = season_dir / f"E{ep_num:02d}.mp4"
|
||||
|
||||
# Check if already exists
|
||||
if output_path.exists() and resume:
|
||||
self.logger.info(f"E{ep_num:02d}: Skipping (file exists)")
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
video_url = self._construct_video_url(cdn_slug, ep_num)
|
||||
self.logger.info(f"E{ep_num:02d}: Downloading...")
|
||||
|
||||
success = self.download_video(video_url, output_path)
|
||||
|
||||
if success:
|
||||
downloaded += 1
|
||||
else:
|
||||
failed += 1
|
||||
|
||||
if not self.dry_run:
|
||||
await self._random_delay(0.5, 1.5)
|
||||
|
||||
self.logger.info(f"\nComplete! Downloaded: {downloaded}, Skipped: {skipped}, Failed: {failed}")
|
||||
|
||||
async def download_season(
|
||||
self,
|
||||
season_url: str,
|
||||
start_ep: Optional[int] = None,
|
||||
end_ep: Optional[int] = None,
|
||||
resume: bool = False,
|
||||
direct: bool = False
|
||||
) -> None:
|
||||
"""
|
||||
Download all episodes from a season.
|
||||
|
||||
Args:
|
||||
season_url: URL to the season browse page
|
||||
start_ep: First episode number to download (inclusive)
|
||||
end_ep: Last episode number to download (inclusive)
|
||||
resume: Whether to resume from previous state
|
||||
direct: If True, skip page visits and download by episode number
|
||||
"""
|
||||
# Direct mode - just download by episode number
|
||||
if direct:
|
||||
if start_ep is None or end_ep is None:
|
||||
self.logger.error("Direct mode requires --start and --end episode numbers")
|
||||
return
|
||||
await self.download_direct(season_url, start_ep, end_ep, resume)
|
||||
return
|
||||
|
||||
# Get episode list
|
||||
season_name, cdn_slug, episode_data = await self.get_episode_list(season_url)
|
||||
|
||||
if not episode_data:
|
||||
self.logger.error("No episodes found!")
|
||||
return
|
||||
|
||||
# Create output directory
|
||||
season_dir = self.output_dir / re.sub(r'[<>:"/\\|?*]', '', season_name).strip()
|
||||
season_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# State file for resume
|
||||
state_path = season_dir / 'download_state.json'
|
||||
state = None
|
||||
|
||||
if resume:
|
||||
state = DownloadState.load(state_path)
|
||||
if state:
|
||||
self.logger.info(f"Resuming from previous state ({state.last_updated})")
|
||||
|
||||
if not state:
|
||||
state = DownloadState(
|
||||
season_url=season_url,
|
||||
season_name=season_name,
|
||||
cdn_slug=cdn_slug,
|
||||
episode_urls=[url for url, _ in episode_data]
|
||||
)
|
||||
|
||||
self.logger.info(f"Processing {len(episode_data)} episodes (quality: {self.quality})")
|
||||
|
||||
# Process each episode
|
||||
downloaded_count = 0
|
||||
skipped_count = 0
|
||||
failed_count = 0
|
||||
|
||||
for page_url, slug in episode_data:
|
||||
title = self._slug_to_title(slug)
|
||||
|
||||
# Check if we already have this episode in state (by URL)
|
||||
existing_ep = None
|
||||
for ep_data in state.episodes.values():
|
||||
if ep_data.get('page_url') == page_url:
|
||||
existing_ep = ep_data
|
||||
break
|
||||
|
||||
if existing_ep and existing_ep.get('downloaded') and resume:
|
||||
self.logger.info(f"Skipping: {title} (already downloaded)")
|
||||
skipped_count += 1
|
||||
continue
|
||||
|
||||
# Get the CDN episode number by visiting the page
|
||||
self.logger.info(f"Checking: {title}")
|
||||
cdn_num = await self.get_episode_cdn_number(page_url)
|
||||
|
||||
if cdn_num is None:
|
||||
self.logger.error(f" Could not detect episode number, skipping")
|
||||
failed_count += 1
|
||||
continue
|
||||
|
||||
# Check if within requested range
|
||||
if start_ep is not None and cdn_num < start_ep:
|
||||
self.logger.info(f" Episode {cdn_num} before start range, skipping")
|
||||
continue
|
||||
if end_ep is not None and cdn_num > end_ep:
|
||||
self.logger.info(f" Episode {cdn_num} after end range, skipping")
|
||||
continue
|
||||
|
||||
# Check if file already exists
|
||||
output_path = season_dir / f"E{cdn_num:02d} - {title}.mp4"
|
||||
if output_path.exists() and resume:
|
||||
self.logger.info(f" File exists, skipping")
|
||||
state.episodes[str(cdn_num)] = {
|
||||
'cdn_number': cdn_num,
|
||||
'title': title,
|
||||
'page_url': page_url,
|
||||
'slug': slug,
|
||||
'video_url': self._construct_video_url(cdn_slug, cdn_num),
|
||||
'downloaded': True,
|
||||
'error': None
|
||||
}
|
||||
state.save(state_path)
|
||||
skipped_count += 1
|
||||
continue
|
||||
|
||||
# Construct video URL and download
|
||||
video_url = self._construct_video_url(cdn_slug, cdn_num)
|
||||
self.logger.info(f" Episode {cdn_num}: {title}")
|
||||
|
||||
success = self.download_video(video_url, output_path)
|
||||
|
||||
# Save state
|
||||
state.episodes[str(cdn_num)] = {
|
||||
'cdn_number': cdn_num,
|
||||
'title': title,
|
||||
'page_url': page_url,
|
||||
'slug': slug,
|
||||
'video_url': video_url,
|
||||
'downloaded': success,
|
||||
'error': None if success else "Download failed"
|
||||
}
|
||||
state.save(state_path)
|
||||
|
||||
if success:
|
||||
downloaded_count += 1
|
||||
else:
|
||||
failed_count += 1
|
||||
|
||||
# Delay between episodes
|
||||
if not self.dry_run:
|
||||
await self._random_delay(1, 2)
|
||||
|
||||
# Summary
|
||||
self.logger.info(f"\nComplete!")
|
||||
self.logger.info(f" Downloaded: {downloaded_count}")
|
||||
self.logger.info(f" Skipped: {skipped_count}")
|
||||
self.logger.info(f" Failed: {failed_count}")
|
||||
self.logger.info(f" Output: {season_dir}")
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# CLI
|
||||
# ============================================================================
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Download Pokemon episodes from pokeflix.tv',
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog="""
|
||||
Examples:
|
||||
%(prog)s --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" --output ~/Pokemon/
|
||||
%(prog)s --url "..." --start 1 --end 10 --output ~/Pokemon/
|
||||
%(prog)s --url "..." --output ~/Pokemon/ --resume
|
||||
%(prog)s --url "..." --quality 720p --output ~/Pokemon/
|
||||
%(prog)s --url "..." --dry-run
|
||||
"""
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--url', '-u',
|
||||
required=True,
|
||||
help='URL of the season/series page'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--output', '-o',
|
||||
type=Path,
|
||||
default=Path.home() / 'Downloads' / 'Pokemon',
|
||||
help='Output directory (default: ~/Downloads/Pokemon)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--start', '-s',
|
||||
type=int,
|
||||
help='Start episode number (CDN number)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--end', '-e',
|
||||
type=int,
|
||||
help='End episode number (CDN number)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--quality', '-q',
|
||||
choices=['1080p', '720p', '360p'],
|
||||
default='1080p',
|
||||
help='Video quality (default: 1080p)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--resume', '-r',
|
||||
action='store_true',
|
||||
help='Resume from previous download state'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--dry-run', '-n',
|
||||
action='store_true',
|
||||
help='Extract URLs only, do not download'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--headless',
|
||||
action='store_true',
|
||||
help='Run browser in headless mode (may trigger anti-bot)'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--verbose', '-v',
|
||||
action='store_true',
|
||||
help='Enable verbose logging'
|
||||
)
|
||||
parser.add_argument(
|
||||
'--direct',
|
||||
action='store_true',
|
||||
help='Direct download mode - skip page visits, just download episode range by number'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
async def run():
|
||||
async with PokeflixScraper(
|
||||
output_dir=args.output,
|
||||
headless=args.headless,
|
||||
dry_run=args.dry_run,
|
||||
verbose=args.verbose,
|
||||
quality=args.quality
|
||||
) as scraper:
|
||||
await scraper.download_season(
|
||||
season_url=args.url,
|
||||
start_ep=args.start,
|
||||
end_ep=args.end,
|
||||
resume=args.resume,
|
||||
direct=args.direct
|
||||
)
|
||||
|
||||
asyncio.run(run())
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
195
media-tools/troubleshooting.md
Normal file
195
media-tools/troubleshooting.md
Normal file
@ -0,0 +1,195 @@
|
||||
# Media Tools Troubleshooting
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Playwright Issues
|
||||
|
||||
#### "playwright not installed" Error
|
||||
```
|
||||
ERROR: playwright not installed. Run: pip install playwright && playwright install chromium
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
pip install playwright
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
#### Browser Launch Fails
|
||||
```
|
||||
Error: Executable doesn't exist at /home/user/.cache/ms-playwright/chromium-xxx/chrome-linux/chrome
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
#### Timeout Errors
|
||||
```
|
||||
TimeoutError: Timeout 30000ms exceeded
|
||||
```
|
||||
|
||||
**Causes:**
|
||||
- Slow network connection
|
||||
- Site is blocking automated access
|
||||
- Page structure has changed
|
||||
|
||||
**Solutions:**
|
||||
1. Increase timeout in script
|
||||
2. Try without `--headless` flag
|
||||
3. Check if site is up manually
|
||||
|
||||
---
|
||||
|
||||
### yt-dlp Issues
|
||||
|
||||
#### "yt-dlp not found" Error
|
||||
```
|
||||
yt-dlp not found. Install with: pip install yt-dlp
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
pip install yt-dlp
|
||||
```
|
||||
|
||||
#### Download Fails for Specific Host
|
||||
```
|
||||
ERROR: Unsupported URL: https://somehost.com/...
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Update yt-dlp to latest version
|
||||
pip install -U yt-dlp
|
||||
```
|
||||
|
||||
If still failing, the host may be unsupported. Check [yt-dlp supported sites](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md).
|
||||
|
||||
#### Slow Downloads
|
||||
**Causes:**
|
||||
- Video host throttling
|
||||
- Network issues
|
||||
|
||||
**Solutions:**
|
||||
- Downloads are typically limited by the source server
|
||||
- Try at different times of day
|
||||
|
||||
---
|
||||
|
||||
### Scraping Issues
|
||||
|
||||
#### No Episodes Found
|
||||
```
|
||||
No episodes found!
|
||||
```
|
||||
|
||||
**Causes:**
|
||||
- Site structure has changed
|
||||
- Page requires authentication
|
||||
- Cloudflare protection triggered
|
||||
|
||||
**Solutions:**
|
||||
1. Run without `--headless` to see what's happening
|
||||
2. Check if the URL is correct and accessible manually
|
||||
3. Site may have updated their HTML structure - check selectors in script
|
||||
|
||||
#### Video URL Not Found
|
||||
```
|
||||
No video URL found for episode X
|
||||
```
|
||||
|
||||
**Causes:**
|
||||
- Video is on an unsupported host
|
||||
- Page uses non-standard embedding method
|
||||
- Anti-bot protection on video player
|
||||
|
||||
**Solutions:**
|
||||
1. Run with `--verbose` to see what URLs are being tried
|
||||
2. Open episode manually and check Network tab for video requests
|
||||
3. May need to add new iframe selectors for the specific host
|
||||
|
||||
#### 403 Forbidden on Site
|
||||
**Cause:** Site is blocking automated requests
|
||||
|
||||
**Solutions:**
|
||||
1. Ensure you're NOT using `--headless`
|
||||
2. Increase random delays
|
||||
3. Clear browser cache/cookies (restart script)
|
||||
4. Try from a different IP
|
||||
|
||||
---
|
||||
|
||||
### Resume Issues
|
||||
|
||||
#### Resume Not Working
|
||||
```
|
||||
# Should skip downloaded episodes but re-downloads them
|
||||
```
|
||||
|
||||
**Check:**
|
||||
1. Ensure `download_state.json` exists in output directory
|
||||
2. Verify the `--resume` flag is being used
|
||||
3. Check that episode numbers match between runs
|
||||
|
||||
#### Corrupt State File
|
||||
```
|
||||
JSONDecodeError: ...
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Delete the state file to start fresh
|
||||
rm /path/to/season/download_state.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debug Mode
|
||||
|
||||
Run with verbose output:
|
||||
```bash
|
||||
python pokeflix_scraper.py --url "..." --output ~/Pokemon/ --verbose
|
||||
```
|
||||
|
||||
Run dry-run to test URL extraction:
|
||||
```bash
|
||||
python pokeflix_scraper.py --url "..." --dry-run --verbose
|
||||
```
|
||||
|
||||
Watch the browser (non-headless):
|
||||
```bash
|
||||
python pokeflix_scraper.py --url "..." --output ~/Pokemon/
|
||||
# (headless is off by default)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Manual Workarounds
|
||||
|
||||
### If Automated Extraction Fails
|
||||
|
||||
1. **Browser DevTools method:**
|
||||
- Open episode in browser
|
||||
- F12 → Network tab → filter "m3u8" or "mp4"
|
||||
- Play video, copy the stream URL
|
||||
- Download manually: `yt-dlp "URL"`
|
||||
|
||||
2. **Check iframe manually:**
|
||||
- Right-click video player → Inspect
|
||||
- Find `<iframe>` element
|
||||
- Copy `src` attribute
|
||||
- Use that URL with yt-dlp
|
||||
|
||||
### Known Video Hosts
|
||||
|
||||
These hosts are typically supported by yt-dlp:
|
||||
- Streamtape
|
||||
- Vidoza
|
||||
- Mp4upload
|
||||
- Doodstream
|
||||
- Filemoon
|
||||
- Voe.sx
|
||||
|
||||
If the video is on an unsupported host, check if there's an alternative server/quality option on the episode page.
|
||||
184
productivity/n8n/workflows/tdarr-gpu-monitor.json
Normal file
184
productivity/n8n/workflows/tdarr-gpu-monitor.json
Normal file
@ -0,0 +1,184 @@
|
||||
{
|
||||
"name": "Tdarr GPU Monitor",
|
||||
"nodes": [
|
||||
{
|
||||
"parameters": {
|
||||
"rule": {
|
||||
"interval": [
|
||||
{
|
||||
"field": "minutes",
|
||||
"minutesInterval": 5
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"id": "schedule-trigger",
|
||||
"name": "Every 5 Minutes",
|
||||
"type": "n8n-nodes-base.scheduleTrigger",
|
||||
"typeVersion": 1.2,
|
||||
"position": [0, 0]
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"command": "ssh -i /root/.ssh/n8n_to_claude -o BatchMode=yes -o ConnectTimeout=10 cal@10.10.0.226 'docker exec tdarr-node nvidia-smi --query-gpu=name --format=csv,noheader 2>&1'"
|
||||
},
|
||||
"id": "check-gpu",
|
||||
"name": "Check GPU Access",
|
||||
"type": "n8n-nodes-base.executeCommand",
|
||||
"typeVersion": 1,
|
||||
"position": [220, 0]
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"conditions": {
|
||||
"options": {
|
||||
"caseSensitive": false,
|
||||
"leftValue": "",
|
||||
"typeValidation": "loose"
|
||||
},
|
||||
"conditions": [
|
||||
{
|
||||
"id": "gpu-error-check",
|
||||
"leftValue": "={{ $json.stdout + $json.stderr }}",
|
||||
"rightValue": "NVML|CUDA_ERROR|Unknown Error|no CUDA-capable|Failed to initialize",
|
||||
"operator": {
|
||||
"type": "string",
|
||||
"operation": "regex"
|
||||
}
|
||||
}
|
||||
],
|
||||
"combinator": "or"
|
||||
}
|
||||
},
|
||||
"id": "check-error",
|
||||
"name": "GPU Error Detected?",
|
||||
"type": "n8n-nodes-base.if",
|
||||
"typeVersion": 2.2,
|
||||
"position": [440, 0]
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"command": "ssh -i /root/.ssh/n8n_to_claude -o BatchMode=yes -o ConnectTimeout=10 cal@10.10.0.226 'cd /home/cal/docker/tdarr && docker compose restart tdarr-node 2>&1'"
|
||||
},
|
||||
"id": "restart-container",
|
||||
"name": "Restart tdarr-node",
|
||||
"type": "n8n-nodes-base.executeCommand",
|
||||
"typeVersion": 1,
|
||||
"position": [660, -100]
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"command": "ssh -i /root/.ssh/n8n_to_claude -o BatchMode=yes -o ConnectTimeout=30 cal@10.10.0.226 'sleep 5 && docker exec tdarr-node nvidia-smi --query-gpu=name --format=csv,noheader 2>&1'"
|
||||
},
|
||||
"id": "verify-gpu",
|
||||
"name": "Verify GPU Restored",
|
||||
"type": "n8n-nodes-base.executeCommand",
|
||||
"typeVersion": 1,
|
||||
"position": [880, -100]
|
||||
},
|
||||
{
|
||||
"parameters": {
|
||||
"method": "POST",
|
||||
"url": "https://discord.com/api/webhooks/1451783909409816763/O9PMDiNt6ZIWRf8HKocIZ_E4vMGV_lEwq50aAiZ9HVFR2UGwO6J1N9_wOm82p0MetIqT",
|
||||
"sendBody": true,
|
||||
"specifyBody": "json",
|
||||
"jsonBody": "={\n \"embeds\": [{\n \"title\": \"Tdarr GPU Recovery\",\n \"description\": \"GPU access was lost and container was automatically restarted.\",\n \"color\": {{ $('Verify GPU Restored').item.json.stdout.includes('GTX') || $('Verify GPU Restored').item.json.stdout.includes('NVIDIA') ? 3066993 : 15158332 }},\n \"fields\": [\n {\n \"name\": \"Original Error\",\n \"value\": \"```{{ $('Check GPU Access').item.json.stdout.substring(0, 200) }}{{ $('Check GPU Access').item.json.stderr.substring(0, 200) }}```\",\n \"inline\": false\n },\n {\n \"name\": \"Recovery Status\",\n \"value\": \"{{ $('Verify GPU Restored').item.json.stdout.includes('GTX') || $('Verify GPU Restored').item.json.stdout.includes('NVIDIA') ? '✅ GPU access restored' : '❌ GPU still not accessible - manual intervention needed' }}\",\n \"inline\": false\n },\n {\n \"name\": \"Post-Restart GPU\",\n \"value\": \"```{{ $('Verify GPU Restored').item.json.stdout.substring(0, 200) }}```\",\n \"inline\": false\n }\n ],\n \"timestamp\": \"{{ new Date().toISOString() }}\",\n \"footer\": {\n \"text\": \"Tdarr GPU Monitor | ubuntu-manticore\"\n }\n }]\n}",
|
||||
"options": {}
|
||||
},
|
||||
"id": "discord-notify",
|
||||
"name": "Discord Notification",
|
||||
"type": "n8n-nodes-base.httpRequest",
|
||||
"typeVersion": 4.2,
|
||||
"position": [1100, -100]
|
||||
},
|
||||
{
|
||||
"parameters": {},
|
||||
"id": "no-action",
|
||||
"name": "GPU OK - No Action",
|
||||
"type": "n8n-nodes-base.noOp",
|
||||
"typeVersion": 1,
|
||||
"position": [660, 100]
|
||||
}
|
||||
],
|
||||
"connections": {
|
||||
"Every 5 Minutes": {
|
||||
"main": [
|
||||
[
|
||||
{
|
||||
"node": "Check GPU Access",
|
||||
"type": "main",
|
||||
"index": 0
|
||||
}
|
||||
]
|
||||
]
|
||||
},
|
||||
"Check GPU Access": {
|
||||
"main": [
|
||||
[
|
||||
{
|
||||
"node": "GPU Error Detected?",
|
||||
"type": "main",
|
||||
"index": 0
|
||||
}
|
||||
]
|
||||
]
|
||||
},
|
||||
"GPU Error Detected?": {
|
||||
"main": [
|
||||
[
|
||||
{
|
||||
"node": "Restart tdarr-node",
|
||||
"type": "main",
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
[
|
||||
{
|
||||
"node": "GPU OK - No Action",
|
||||
"type": "main",
|
||||
"index": 0
|
||||
}
|
||||
]
|
||||
]
|
||||
},
|
||||
"Restart tdarr-node": {
|
||||
"main": [
|
||||
[
|
||||
{
|
||||
"node": "Verify GPU Restored",
|
||||
"type": "main",
|
||||
"index": 0
|
||||
}
|
||||
]
|
||||
]
|
||||
},
|
||||
"Verify GPU Restored": {
|
||||
"main": [
|
||||
[
|
||||
{
|
||||
"node": "Discord Notification",
|
||||
"type": "main",
|
||||
"index": 0
|
||||
}
|
||||
]
|
||||
]
|
||||
}
|
||||
},
|
||||
"settings": {
|
||||
"executionOrder": "v1"
|
||||
},
|
||||
"staticData": null,
|
||||
"tags": [
|
||||
{
|
||||
"name": "homelab"
|
||||
},
|
||||
{
|
||||
"name": "monitoring"
|
||||
},
|
||||
{
|
||||
"name": "tdarr"
|
||||
}
|
||||
],
|
||||
"triggerCount": 1,
|
||||
"pinData": {}
|
||||
}
|
||||
385
vm-management/scripts/CONTEXT.md
Normal file
385
vm-management/scripts/CONTEXT.md
Normal file
@ -0,0 +1,385 @@
|
||||
# VM Management Scripts - Operational Context
|
||||
|
||||
## Script Overview
|
||||
This directory contains active operational scripts for VM provisioning, LXC container creation, Docker configuration in containers, and system migration.
|
||||
|
||||
## Core Scripts
|
||||
|
||||
### VM Post-Installation Provisioning
|
||||
**Script**: `vm-post-install.sh`
|
||||
**Purpose**: Automated provisioning of existing VMs with security hardening, SSH keys, and Docker
|
||||
|
||||
**Key Features**:
|
||||
- System updates and essential packages installation
|
||||
- SSH key deployment (primary + emergency keys)
|
||||
- SSH security hardening (disable password authentication)
|
||||
- Docker and Docker Compose installation
|
||||
- User environment setup with bash aliases
|
||||
- Automatic security updates configuration
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
./vm-post-install.sh <vm-ip> [ssh-user]
|
||||
|
||||
# Example
|
||||
./vm-post-install.sh 10.10.0.100 cal
|
||||
```
|
||||
|
||||
**Requirements**:
|
||||
- Target VM must have SSH access enabled initially
|
||||
- Homelab SSH keys must exist: `~/.ssh/homelab_rsa` and `~/.ssh/emergency_homelab_rsa`
|
||||
- Initial connection may require password authentication (disabled after provisioning)
|
||||
|
||||
**Post-Provision Verification**:
|
||||
```bash
|
||||
# Test SSH with key
|
||||
ssh cal@<vm-ip>
|
||||
|
||||
# Verify Docker
|
||||
docker --version
|
||||
docker run --rm hello-world
|
||||
|
||||
# Check security
|
||||
sudo sshd -T | grep -E "(passwordauth|pubkeyauth)"
|
||||
```
|
||||
|
||||
### Cloud-Init Automated Provisioning
|
||||
**File**: `cloud-init-user-data.yaml`
|
||||
**Purpose**: Fully automated VM provisioning from first boot using Proxmox cloud-init
|
||||
|
||||
**Features**:
|
||||
- User creation with sudo privileges
|
||||
- SSH keys pre-installed (no password auth needed)
|
||||
- Automatic package updates
|
||||
- Docker and Docker Compose installation
|
||||
- Security hardening from first boot
|
||||
- Useful bash aliases and environment setup
|
||||
- Welcome message with system status
|
||||
|
||||
**Usage in Proxmox**:
|
||||
1. Create new VM with cloud-init support
|
||||
2. Go to Cloud-Init tab in VM settings
|
||||
3. Copy contents of `cloud-init-user-data.yaml`
|
||||
4. Paste into "User Data" field
|
||||
5. Start VM - fully provisioned automatically
|
||||
|
||||
**Benefits**:
|
||||
- Zero-touch provisioning
|
||||
- Consistent configuration across all VMs
|
||||
- No password authentication ever enabled
|
||||
- Faster deployment than post-install script
|
||||
|
||||
### Docker AppArmor Fix for LXC
|
||||
**Script**: `fix-docker-apparmor.sh`
|
||||
**Purpose**: Add AppArmor=unconfined to docker-compose.yml files for LXC compatibility
|
||||
|
||||
**Why Needed**: Docker containers inside LXC containers require AppArmor to be disabled. Without this fix, containers may fail to start or have permission issues.
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
./fix-docker-apparmor.sh <LXC_IP> [COMPOSE_DIR]
|
||||
|
||||
# Example - use default directory (/home/cal/container-data)
|
||||
./fix-docker-apparmor.sh 10.10.0.214
|
||||
|
||||
# Example - specify custom directory
|
||||
./fix-docker-apparmor.sh 10.10.0.214 /home/cal/docker
|
||||
```
|
||||
|
||||
**What It Does**:
|
||||
1. SSHs into the LXC container
|
||||
2. Finds all docker-compose.yml files in specified directory
|
||||
3. Adds `security_opt: ["apparmor=unconfined"]` to each service
|
||||
4. Creates `.bak` backups of original files before modification
|
||||
|
||||
**Safety Features**:
|
||||
- Creates backups before modifications
|
||||
- Color-coded output for easy monitoring
|
||||
- Error handling with detailed logging
|
||||
- Validates SSH connectivity before proceeding
|
||||
|
||||
### LXC Container Creation with Docker
|
||||
**Script**: `lxc-docker-create.sh`
|
||||
**Purpose**: Create new LXC containers pre-configured for Docker hosting
|
||||
|
||||
**Key Features**:
|
||||
- Automated LXC container creation in Proxmox
|
||||
- Docker and Docker Compose pre-installed
|
||||
- AppArmor configuration for container compatibility
|
||||
- Network configuration
|
||||
- Security settings optimized for Docker hosting
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
./lxc-docker-create.sh [options]
|
||||
```
|
||||
|
||||
**Common Use Cases**:
|
||||
- Creating Docker hosts for specific services (n8n, gitea, etc.)
|
||||
- Rapid deployment of containerized applications
|
||||
- Consistent LXC configuration across infrastructure
|
||||
|
||||
### LXC Migration Guide
|
||||
**Document**: `LXC-MIGRATION-GUIDE.md`
|
||||
**Purpose**: Step-by-step procedures for migrating LXC containers between hosts
|
||||
|
||||
**Covers**:
|
||||
- Pre-migration planning and backups
|
||||
- LXC configuration export/import
|
||||
- Storage migration strategies
|
||||
- Network reconfiguration
|
||||
- Post-migration validation
|
||||
- Rollback procedures
|
||||
|
||||
**When to Use**:
|
||||
- Moving containers to new Proxmox host
|
||||
- Hardware upgrades
|
||||
- Load balancing across nodes
|
||||
- Disaster recovery scenarios
|
||||
|
||||
## Operational Patterns
|
||||
|
||||
### VM Provisioning Workflow
|
||||
|
||||
**Option 1: New VMs (Preferred)**
|
||||
```bash
|
||||
# 1. Create VM in Proxmox with cloud-init support
|
||||
# 2. Copy cloud-init-user-data.yaml to User Data field
|
||||
# 3. Start VM
|
||||
# 4. Verify provisioning completed:
|
||||
ssh cal@<vm-ip>
|
||||
docker --version
|
||||
```
|
||||
|
||||
**Option 2: Existing VMs**
|
||||
```bash
|
||||
# 1. Ensure VM has SSH enabled and accessible
|
||||
# 2. Run post-install script:
|
||||
./vm-post-install.sh 10.10.0.100 cal
|
||||
|
||||
# 3. Verify provisioning:
|
||||
ssh cal@10.10.0.100
|
||||
docker --version
|
||||
```
|
||||
|
||||
### LXC Docker Deployment Workflow
|
||||
|
||||
**Creating New LXC for Docker**:
|
||||
```bash
|
||||
# 1. Create LXC container
|
||||
./lxc-docker-create.sh --id 220 --hostname docker-app --ip 10.10.0.220
|
||||
|
||||
# 2. Deploy docker-compose configurations
|
||||
scp -r ./app-config/ cal@10.10.0.220:/home/cal/container-data/
|
||||
|
||||
# 3. Fix AppArmor compatibility
|
||||
./fix-docker-apparmor.sh 10.10.0.220
|
||||
|
||||
# 4. Start containers
|
||||
ssh cal@10.10.0.220 "cd /home/cal/container-data/app-config && docker compose up -d"
|
||||
```
|
||||
|
||||
**Existing LXC with Docker Issues**:
|
||||
```bash
|
||||
# If containers failing to start in LXC:
|
||||
./fix-docker-apparmor.sh <LXC_IP>
|
||||
|
||||
# Restart affected containers
|
||||
ssh cal@<LXC_IP> "cd /home/cal/container-data && docker compose restart"
|
||||
```
|
||||
|
||||
### SSH Key Integration
|
||||
|
||||
**Both provisioning methods use**:
|
||||
- **Primary Key**: `~/.ssh/homelab_rsa` - Daily use authentication
|
||||
- **Emergency Key**: `~/.ssh/emergency_homelab_rsa` - Backup access
|
||||
|
||||
**Security Configuration**:
|
||||
- Password authentication completely disabled after provisioning
|
||||
- Only key-based SSH access allowed
|
||||
- Emergency keys provide backup access if primary key fails
|
||||
- Automatic security updates enabled
|
||||
|
||||
**Key Management Integration**:
|
||||
- Keys managed by `/networking/scripts/ssh_key_maintenance.sh`
|
||||
- Monthly backups of all SSH keys
|
||||
- Rotation recommendations for keys > 365 days old
|
||||
|
||||
## Configuration Dependencies
|
||||
|
||||
### Required Local Files
|
||||
- `~/.ssh/homelab_rsa` - Primary SSH private key
|
||||
- `~/.ssh/homelab_rsa.pub` - Primary SSH public key
|
||||
- `~/.ssh/emergency_homelab_rsa` - Emergency SSH private key
|
||||
- `~/.ssh/emergency_homelab_rsa.pub` - Emergency SSH public key
|
||||
|
||||
### Target VM Requirements
|
||||
- **For post-install script**: SSH enabled, initial authentication method available
|
||||
- **For cloud-init**: Proxmox cloud-init support, fresh VM
|
||||
- **For LXC**: Proxmox host with LXC support
|
||||
|
||||
### Network Requirements
|
||||
- VMs/LXCs on 10.10.0.0/24 network (homelab subnet)
|
||||
- SSH access (port 22) to target systems
|
||||
- Internet access on target systems for package installation
|
||||
|
||||
## Troubleshooting Context
|
||||
|
||||
### Common Issues
|
||||
|
||||
**1. vm-post-install.sh Connection Failures**
|
||||
```bash
|
||||
# Verify VM is accessible
|
||||
ping <vm-ip>
|
||||
nc -z <vm-ip> 22
|
||||
|
||||
# Check SSH service on target
|
||||
ssh <vm-ip> "systemctl status sshd"
|
||||
|
||||
# Verify SSH keys exist locally
|
||||
ls -la ~/.ssh/homelab_rsa*
|
||||
```
|
||||
|
||||
**2. Cloud-Init Not Working**
|
||||
```bash
|
||||
# On Proxmox host, check cloud-init support
|
||||
qm cloudinit dump <vmid>
|
||||
|
||||
# On VM, check cloud-init logs
|
||||
sudo cloud-init status --long
|
||||
sudo cat /var/log/cloud-init.log
|
||||
```
|
||||
|
||||
**3. Docker Containers Fail in LXC**
|
||||
```bash
|
||||
# Symptom: Containers won't start, permission errors
|
||||
# Solution: Run AppArmor fix
|
||||
./fix-docker-apparmor.sh <LXC_IP>
|
||||
|
||||
# Verify security_opt was added
|
||||
ssh cal@<LXC_IP> "grep -r 'security_opt' ~/container-data/"
|
||||
|
||||
# Check Docker logs
|
||||
ssh cal@<LXC_IP> "docker compose logs"
|
||||
```
|
||||
|
||||
**4. SSH Key Authentication Fails After Provisioning**
|
||||
```bash
|
||||
# Verify key permissions
|
||||
ls -la ~/.ssh/homelab_rsa
|
||||
chmod 600 ~/.ssh/homelab_rsa
|
||||
|
||||
# Check authorized_keys on target
|
||||
ssh <vm-ip> "cat ~/.ssh/authorized_keys"
|
||||
|
||||
# Test with verbose output
|
||||
ssh -v cal@<vm-ip>
|
||||
```
|
||||
|
||||
**5. Docker Installation Issues**
|
||||
```bash
|
||||
# Check internet connectivity on VM
|
||||
ssh <vm-ip> "ping -c 3 8.8.8.8"
|
||||
|
||||
# Verify Docker GPG key
|
||||
ssh <vm-ip> "apt-key list | grep -i docker"
|
||||
|
||||
# Check Docker service
|
||||
ssh <vm-ip> "systemctl status docker"
|
||||
|
||||
# Manual Docker install if needed
|
||||
ssh <vm-ip> "curl -fsSL https://get.docker.com | sh"
|
||||
```
|
||||
|
||||
### Diagnostic Commands
|
||||
|
||||
```bash
|
||||
# Post-provisioning validation
|
||||
ssh cal@<vm-ip> "groups" # Should include: sudo docker
|
||||
ssh cal@<vm-ip> "docker run --rm hello-world"
|
||||
ssh cal@<vm-ip> "sudo sshd -T | grep passwordauth" # Should be "no"
|
||||
|
||||
# Cloud-init status check
|
||||
ssh cal@<vm-ip> "cloud-init status"
|
||||
ssh cal@<vm-ip> "cloud-init query -f '{{ds.meta_data.hostname}}'"
|
||||
|
||||
# Docker in LXC verification
|
||||
ssh cal@<LXC_IP> "docker info | grep -i apparmor"
|
||||
ssh cal@<LXC_IP> "docker compose config" # Validate compose files
|
||||
|
||||
# SSH key connectivity test
|
||||
ssh -o ConnectTimeout=5 cal@<vm-ip> "echo 'SSH OK'"
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
### External Dependencies
|
||||
- **Proxmox VE**: For VM/LXC creation and cloud-init support
|
||||
- **SSH**: For remote provisioning and management
|
||||
- **Docker**: Installed on target systems
|
||||
- **Cloud-init**: For automated VM provisioning
|
||||
- **AppArmor**: Security framework (configured for LXC compatibility)
|
||||
|
||||
### File System Dependencies
|
||||
- **Script Directory**: `/mnt/NV2/Development/claude-home/vm-management/scripts/`
|
||||
- **SSH Keys**: `~/.ssh/homelab_rsa*`, `~/.ssh/emergency_homelab_rsa*`
|
||||
- **LXC Compose Directories**: Typically `/home/cal/container-data/` on target
|
||||
- **Backup Files**: `.bak` files created by AppArmor fix script
|
||||
|
||||
### Network Dependencies
|
||||
- **Management Network**: 10.10.0.0/24 subnet
|
||||
- **Internet Access**: Required for package installation
|
||||
- **Proxmox API**: For LXC creation operations
|
||||
- **DNS**: For hostname resolution
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### SSH Security
|
||||
- Password authentication disabled after provisioning
|
||||
- Only key-based authentication allowed
|
||||
- Emergency keys provide backup access
|
||||
- Root login disabled
|
||||
|
||||
### Docker Security
|
||||
- User in docker group (no sudo needed for docker commands)
|
||||
- AppArmor unconfined in LXC (required for functionality)
|
||||
- Containers run as non-root when possible
|
||||
- Network isolation via Docker networks
|
||||
|
||||
### VM/LXC Security
|
||||
- Automatic security updates enabled
|
||||
- Minimal package installation (only essentials)
|
||||
- Firewall configuration recommended post-provisioning
|
||||
- Regular key rotation via SSH key maintenance
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Cloud-Init vs Post-Install
|
||||
- **Cloud-init**: Faster (0-touch), no manual SSH needed, better for multiple VMs
|
||||
- **Post-install**: More flexible, works with existing VMs, easier debugging
|
||||
|
||||
### LXC vs VM
|
||||
- **LXC**: Lower overhead, faster startup, shared kernel
|
||||
- **VM**: Better isolation, GPU passthrough support, different kernels possible
|
||||
|
||||
### Docker in LXC
|
||||
- **Performance**: Near-native, minimal overhead with AppArmor disabled
|
||||
- **I/O**: Use local storage for best performance, NFS for shared data
|
||||
- **Networking**: Bridge mode for simplicity, macvlan for direct network access
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Technology Overview**: `/vm-management/CONTEXT.md`
|
||||
- **Troubleshooting**: `/vm-management/troubleshooting.md`
|
||||
- **Examples**: `/vm-management/examples/` - Configuration templates
|
||||
- **SSH Management**: `/networking/scripts/ssh_key_maintenance.sh`
|
||||
- **Docker Patterns**: `/docker/CONTEXT.md`
|
||||
- **Main Instructions**: `/CLAUDE.md` - Context loading rules
|
||||
|
||||
## Notes
|
||||
|
||||
These scripts form the foundation of the homelab VM and LXC provisioning strategy. They ensure consistent configuration, security hardening, and Docker compatibility across all virtualized infrastructure.
|
||||
|
||||
The cloud-init approach is preferred for new deployments due to zero-touch provisioning, while the post-install script provides flexibility for existing systems or troubleshooting scenarios.
|
||||
|
||||
AppArmor configuration is critical for Docker-in-LXC deployments and should be applied to all LXC containers running Docker to prevent container startup failures.
|
||||
Loading…
Reference in New Issue
Block a user