paper-dynasty-card-creation/.claude/plans/pull-pitching-stats
2025-07-22 09:24:34 -05:00
..
data-input/2025 Live Cardset Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
pitching_page.shtml Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
pull_baserunning_stats.py Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
pull_pitching_stats.py Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
README.md Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
requirements.txt Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
test_pull_pitching_stats_with_mock.py Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
test_pull_pitching_stats.py Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00

Baseball Reference Pitching Stats Scraper

This script scrapes the Player Standard Pitching table from Baseball Reference and saves it as a CSV file in the specified cardset directory.

Features

  • Scrapes the players_standard_pitching table from Baseball Reference
  • Maps data to the expected CSV format matching existing pitcher-stats.csv files
  • Handles missing data gracefully
  • Includes comprehensive error handling and logging
  • Saves output to the correct cardset directory structure

Installation

pip install -r requirements.txt

Usage

# Basic usage
python pull_pitching_stats.py --year 2025 --cardset-name "2025 Live Cardset"

# With verbose logging
python pull_pitching_stats.py --year 2025 --cardset-name "2025 Live Cardset" --verbose

Parameters

  • --year: The year to scrape (e.g., 2025). This constructs the URL: https://www.baseball-reference.com/leagues/majors/YYYY-standard-pitching.shtml
  • --cardset-name: Name of the cardset directory where the CSV will be saved
  • --verbose: Enable verbose logging (optional)

Output

The script creates:

  • Directory: data-input/{cardset-name}/
  • File: pitching.csv with 41 columns matching the expected format

Column Mapping

The script maps Baseball Reference columns to the expected format:

BR Column Output Column Notes
Name Name Player name
Age Age Player age
Lev Lev League level
Tm Tm Team
G G Games
GS GS Games started
W W Wins
L L Losses
SV SV Saves
IP IP Innings pitched
H H Hits allowed
R R Runs allowed
ER ER Earned runs
BB BB Walks
SO SO Strikeouts
HR HR Home runs allowed
HBP HBP Hit by pitch
ERA ERA Earned run average
AB AB At bats against
2B 2B Doubles allowed
3B 3B Triples allowed
IBB IBB Intentional walks
GDP GDP Ground into double plays
SF SF Sacrifice flies
SB SB Stolen bases allowed
CS CS Caught stealing
PO PO Pickoffs
BF BF Batters faced
Pit Pit Pitches thrown
Str Str Strikes
StL StL Strikes looking
StS StS Strikes swinging
GB/FB GB/FB Ground ball to fly ball ratio
LD LD Line drives
PU PU Pop ups
WHIP WHIP Walks + hits per inning
BAbip BAbip Batting average on balls in play
SO9 SO9 Strikeouts per 9 innings
SO/W SO/W Strikeout to walk ratio
(calculated) #days Left empty for business logic
(extracted) mlbID Left empty (could extract from links)

Error Handling

  • Network timeouts and connection errors
  • Missing table elements
  • Missing columns in scraped data
  • Invalid HTML structure
  • File system errors

Logging

The script provides detailed logging of:

  • URL being scraped
  • Number of columns and rows found
  • Missing columns and data mapping
  • File save location and success status