claude-plugins/plugins/youtube-transcriber/commands/transcribe.md
Cal Corum 51fe634ff5 refactor: convert 5 more skills to commands, update transcriber defaults
Convert backlog, project-plan, save-doc, youtube-transcriber, and
z-image from skills/ to commands/ so they appear as user-invocable
slash commands with plugin name prefixes.

Update youtube-transcriber: switch default model from gpt-4o-transcribe
to gpt-4o-mini-transcribe (OpenAI's current recommendation, half cost)
and fix cost estimates that were 4-7x too high.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 14:41:37 -05:00

4.8 KiB

description
Transcribe YouTube videos of any length using GPT-4o-mini-transcribe

YouTube Transcriber - High-Quality Video Transcription

Script Location

Primary script: $YOUTUBE_TRANSCRIBER_DIR/transcribe.py

Key Features

  • Parallel processing: Multiple videos can be transcribed simultaneously
  • Unlimited length: Auto-chunks videos >10 minutes to prevent API limits
  • Organized output:
    • Transcripts → output/ directory
    • Temp files → temp/ directory (auto-cleaned)
  • High quality: Uses GPT-4o-mini-transcribe by default (OpenAI's recommended model)
  • Cost options: Can use -m gpt-4o-transcribe for the full-size model at 2x cost

Basic Usage

Single Video

cd $YOUTUBE_TRANSCRIBER_DIR
uv run python transcribe.py "https://youtube.com/watch?v=VIDEO_ID"

Output: output/Video_Title_2025-11-10.txt

Multiple Videos in Parallel

cd $YOUTUBE_TRANSCRIBER_DIR

# Launch all in background simultaneously
uv run python transcribe.py "URL1" &
uv run python transcribe.py "URL2" &
uv run python transcribe.py "URL3" &
wait

Why parallel works: Each transcription uses unique temp files (UUID-based) in temp/ directory.

Higher Quality Mode

cd $YOUTUBE_TRANSCRIBER_DIR
uv run python transcribe.py "URL" -m gpt-4o-transcribe

When to use full model: Noisy audio, critical accuracy requirements. Costs 2x more ($0.006/min vs $0.003/min).

Command Options

uv run python transcribe.py [URL] [OPTIONS]

Options:
  -o, --output PATH           Custom output filename (default: auto-generated in output/)
  -m, --model MODEL          Transcription model (default: gpt-4o-mini-transcribe)
                             Options: gpt-4o-mini-transcribe, gpt-4o-transcribe, whisper-1
  -p, --prompt TEXT          Context prompt for better accuracy
  --chunk-duration MINUTES   Chunk size for long videos (default: 10 minutes)
  --keep-audio              Keep temp audio files (default: auto-delete)

Workflow for User Requests

Single Video Request

  1. Change to transcriber directory
  2. Run script with URL
  3. Report output file location in output/ directory

Multiple Video Request

  1. Change to transcriber directory
  2. Launch all transcriptions in parallel using background processes
  3. Wait for all to complete
  4. Report all output files in output/ directory

Testing/Cost-Conscious Request

  1. Default model (gpt-4o-mini-transcribe) is already the cheapest GPT-4o option
  2. For even cheaper: suggest Groq's Whisper API as an alternative
  3. Quality is excellent for most YouTube content

Technical Details

How it works:

  1. Downloads audio from YouTube (via yt-dlp)
  2. Saves to unique temp file: temp/download_{UUID}.mp3
  3. Splits long videos (>10 min) into chunks automatically
  4. Transcribes with OpenAI API (GPT-4o-mini-transcribe)
  5. Saves transcript: output/Video_Title_YYYY-MM-DD.txt
  6. Cleans up temp files automatically

Parallel safety:

  • Each process uses UUID-based temp files
  • No file conflicts between parallel processes
  • Temp files auto-cleaned after completion

Auto-chunking:

  • Videos >10 minutes: Split into 10-minute chunks
  • Context preserved between chunks
  • Prevents API response truncation

Requirements

  • OpenAI API key: $OPENAI_API_KEY environment variable
  • Python 3.10+ with uv package manager
  • FFmpeg (for audio processing)
  • yt-dlp (for YouTube downloads)

Check requirements:

echo $OPENAI_API_KEY  # Should show API key
which ffmpeg          # Should show path

Cost Estimates

Default model (gpt-4o-mini-transcribe at $0.003/min):

  • 5-minute video: ~$0.015
  • 25-minute video: ~$0.075
  • 60-minute video: ~$0.18

Full model (gpt-4o-transcribe at $0.006/min):

  • 5-minute video: ~$0.03
  • 25-minute video: ~$0.15
  • 60-minute video: ~$0.36

Quick Reference

# Single video (default quality)
uv run python transcribe.py "URL"

# Single video (higher quality, 2x cost)
uv run python transcribe.py "URL" -m gpt-4o-transcribe

# Multiple videos in parallel
for url in URL1 URL2 URL3; do
  uv run python transcribe.py "$url" &
done
wait

# With custom output
uv run python transcribe.py "URL" -o custom_name.txt

# With context prompt
uv run python transcribe.py "URL" -p "Context about video content"

Integration

With fabric: Process transcripts after generation

cat output/Video_Title_2025-11-10.txt | fabric -p extract_wisdom

Notes

  • Script requires being in its directory to work correctly
  • Always change to $YOUTUBE_TRANSCRIBER_DIR first
  • Parallel execution is safe and recommended for multiple videos
  • Default model (gpt-4o-mini-transcribe) is recommended for most content
  • Output files automatically named with video title + date
  • Temp files automatically cleaned after transcription