Reindex Knowledge Base / reindex (push) Successful in 3s

Details

docs: add YAML frontmatter to all 151 markdown files

Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-12 09:00:44 -05:00

2.9 KiB

Raw Blame History

title

description

type

domain

Media Tools Scripts

Operational scripts for media downloading and management.

Scripts

pokeflix_scraper.py

Downloads Pokemon episodes from pokeflix.tv.

Dependencies:

pip install playwright yt-dlp
playwright install chromium

Quick Start:

# Download entire season
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/

# Download episodes 1-10 only
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/ \
    --start 1 --end 10

# Resume interrupted download
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --output ~/Pokemon/ \
    --resume

# Dry run (extract URLs, don't download)
python pokeflix_scraper.py \
    --url "https://www.pokeflix.tv/browse/pokemon-indigo-league" \
    --dry-run --verbose

CLI Options:

Option	Description
`--url, -u`	Season page URL (required)
`--output, -o`	Output directory (default: ~/Downloads/Pokemon)
`--start, -s`	First episode number to download
`--end, -e`	Last episode number to download
`--resume, -r`	Resume from previous state
`--dry-run, -n`	Extract URLs only, no download
`--headless`	Run browser without visible window
`--verbose, -v`	Enable debug logging

Output Structure:

~/Pokemon/
├── Pokemon Indigo League/
│   ├── E01 - Pokemon I Choose You.mp4
│   ├── E02 - Pokemon Emergency.mp4
│   ├── E03 - Ash Catches a Pokemon.mp4
│   └── download_state.json

State File:

The download_state.json tracks progress:

{
  "season_url": "https://...",
  "season_name": "Pokemon Indigo League",
  "episodes": {
    "1": {
      "number": 1,
      "title": "Pokemon I Choose You",
      "page_url": "https://...",
      "video_url": "https://...",
      "downloaded": true,
      "error": null
    }
  },
  "last_updated": "2025-01-22T..."
}

Adding New Scrapers

To add a scraper for a new site:

Copy the pattern from pokeflix_scraper.py
Modify the selectors for episode list extraction
Modify the iframe/video URL selectors for the new site's player
Test with --dry-run first

Key methods to customize:

get_season_info() - Extract episode list from season page
extract_video_url() - Get video URL from episode page

Performance Notes

Non-headless mode is recommended (default) to avoid anti-bot detection
Random delays (2-5s) between requests prevent rate limiting
Large seasons (80+ episodes) may take hours - use --resume if interrupted

2.9 KiB Raw Blame History