Cal Corum 8a1d15911f Initial commit: Claude Code configuration backup

Version control Claude Code configuration including:
- Global instructions (CLAUDE.md)
- User settings (settings.json)
- Custom agents (architect, designer, engineer, etc.)
- Custom skills (create-skill templates and workflows)

Excludes session data, secrets, cache, and temporary files per .gitignore.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-03 16:34:21 -06:00

7.7 KiB

Raw Blame History

name	description
youtube-transcriber	Transcribe YouTube videos of any length using OpenAI's GPT-4o-transcribe model. Supports parallel processing for multiple videos and automatic chunking for long content. USE WHEN user says 'transcribe video', 'transcribe youtube', 'get transcript', or provides YouTube URLs.

YouTube Transcriber - High-Quality Video Transcription

When to Activate This Skill

"Transcribe this YouTube video"
"Get a transcript of [URL]"
"Transcribe these videos" (multiple URLs)
User provides YouTube URL(s) needing transcription
"Extract text from video"
Any request involving YouTube video transcription

Script Location

Primary script: /mnt/NV2/Development/youtube-transcriber/transcribe.py

Key Features

Parallel processing: Multiple videos can be transcribed simultaneously
Unlimited length: Auto-chunks videos >10 minutes to prevent API limits
Organized output:
- Transcripts → output/ directory
- Temp files → temp/ directory (auto-cleaned)
High quality: Uses GPT-4o-transcribe by default (reduced hallucinations)
Cost options: Can use -m gpt-4o-mini-transcribe for 50% cost savings

Basic Usage

Single Video

cd /mnt/NV2/Development/youtube-transcriber
uv run python transcribe.py "https://youtube.com/watch?v=VIDEO_ID"

Output: output/Video_Title_2025-11-10.txt

Multiple Videos in Parallel

cd /mnt/NV2/Development/youtube-transcriber

# Launch all in background simultaneously
uv run python transcribe.py "URL1" &
uv run python transcribe.py "URL2" &
uv run python transcribe.py "URL3" &
wait

Why parallel works: Each transcription uses unique temp files (UUID-based) in temp/ directory.

Cost-Saving Mode

cd /mnt/NV2/Development/youtube-transcriber
uv run python transcribe.py "URL" -m gpt-4o-mini-transcribe

When to use mini: Testing, casual content, bulk processing. Quality is the same as gpt-4o-transcribe but ~50% cheaper.

Command Options

uv run python transcribe.py [URL] [OPTIONS]

Options:
  -o, --output PATH           Custom output filename (default: auto-generated in output/)
  -m, --model MODEL          Transcription model (default: gpt-4o-transcribe)
                             Options: gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1
  -p, --prompt TEXT          Context prompt for better accuracy
  --chunk-duration MINUTES   Chunk size for long videos (default: 10 minutes)
  --keep-audio              Keep temp audio files (default: auto-delete)

Workflow for User Requests

Single Video Request

Change to transcriber directory
Run script with URL
Report output file location in output/ directory

Multiple Video Request

Change to transcriber directory
Launch all transcriptions in parallel using background processes
Wait for all to complete
Report all output files in output/ directory

Testing/Cost-Conscious Request

Always use -m gpt-4o-mini-transcribe for testing
Mention cost savings to user
Quality is identical to full model

Example Responses

User: "Transcribe this video: https://youtube.com/watch?v=abc123"

Assistant Action:

cd /mnt/NV2/Development/youtube-transcriber
uv run python transcribe.py "https://youtube.com/watch?v=abc123"

Report: "✅ Transcript saved to output/Video_Title_2025-11-10.txt"

User: "Transcribe these 5 videos: [URL1] [URL2] [URL3] [URL4] [URL5]"

Assistant Action: Launch all 5 in parallel:

cd /mnt/NV2/Development/youtube-transcriber
uv run python transcribe.py "URL1" &
uv run python transcribe.py "URL2" &
uv run python transcribe.py "URL3" &
uv run python transcribe.py "URL4" &
uv run python transcribe.py "URL5" &
wait

Report: "✅ All 5 videos transcribed successfully in parallel. Output files in output/ directory"

Technical Details

How it works:

Downloads audio from YouTube (via yt-dlp)
Saves to unique temp file: temp/download_{UUID}.mp3
Splits long videos (>10 min) into chunks automatically
Transcribes with OpenAI API (GPT-4o-transcribe)
Saves transcript: output/Video_Title_YYYY-MM-DD.txt
Cleans up temp files automatically

Parallel safety:

Each process uses UUID-based temp files
No file conflicts between parallel processes
Temp files auto-cleaned after completion

Auto-chunking:

Videos >10 minutes: Split into 10-minute chunks
Context preserved between chunks
Prevents API response truncation

Requirements

OpenAI API key: $OPENAI_API_KEY environment variable
Python 3.10+ with uv package manager
FFmpeg (for audio processing)
yt-dlp (for YouTube downloads)

Check requirements:

echo $OPENAI_API_KEY  # Should show API key
which ffmpeg          # Should show path

Output Format

Transcripts are saved as plain text with metadata:

================================================================================
YouTube Video Transcript (Long Video)
================================================================================

Title: Video Title Here
Uploader: Channel Name
Duration: 45m 32s
URL: https://youtube.com/watch?v=VIDEO_ID

================================================================================

[Full transcript text with proper punctuation...]

Best Practices

Always use parallel for multiple videos - It's 6x faster
Use mini model for testing - Same quality, half the cost
Check output/ directory - All transcripts organized there
Temp files auto-clean - No manual cleanup needed

Add context prompts for technical content:

uv run python transcribe.py "URL" \
  -p "Technical discussion about Docker, Kubernetes, microservices"

Troubleshooting

API Key Missing:

export OPENAI_API_KEY="sk-proj-your-key-here"

FFmpeg Not Found:

sudo dnf install ffmpeg  # Fedora/Nobara

Parallel Conflicts (shouldn't happen with UUID temps):

Each process creates unique temp file in temp/
If issues occur, check temp/ directory permissions

Cost Estimates (as of March 2025)

5-minute video: $0.10 - $0.20
25-minute video: $0.50 - $1.00
60-minute video: $1.20 - $2.40

Using mini model: Reduce costs by ~50%

Quick Reference

# Single video (default quality)
uv run python transcribe.py "URL"

# Single video (cost-saving)
uv run python transcribe.py "URL" -m gpt-4o-mini-transcribe

# Multiple videos in parallel
for url in URL1 URL2 URL3; do
  uv run python transcribe.py "$url" &
done
wait

# With custom output
uv run python transcribe.py "URL" -o custom_name.txt

# With context prompt
uv run python transcribe.py "URL" -p "Context about video content"

Directory Structure

/mnt/NV2/Development/youtube-transcriber/
├── transcribe.py       # Main script
├── temp/              # Temporary audio files (auto-cleaned)
├── output/            # All transcripts saved here
├── README.md          # Full documentation
└── pyproject.toml     # Dependencies

Integration with Other Skills

With fabric skill: Process transcripts after generation

# 1. Transcribe
uv run python transcribe.py "URL"

# 2. Process with fabric
cat output/Video_Title_2025-11-10.txt | fabric -p extract_wisdom

With research skill: Transcribe source videos for research

# Transcribe multiple research videos in parallel
# Then analyze transcripts for insights

Notes

Script requires being in its directory to work correctly
Always change to /mnt/NV2/Development/youtube-transcriber first
Parallel execution is safe and recommended for multiple videos
Use mini model for testing to save costs
Output files automatically named with video title + date
Temp files automatically cleaned after transcription

7.7 KiB Raw Blame History