The source website uses <span class='energy-text energy-text--type-fire'>
to render inline energy icons. BeautifulSoup's get_text() was stripping
these spans, losing the energy type information and causing merged text
like 'Discard aEnergy' instead of 'Discard a Fire Energy'.
Changes:
- Add ENERGY_TEXT_TYPES mapping for inline energy references
- Add replace_energy_text_spans() to convert spans to text before extraction
- Add extract_effect_text() helper with proper text joining (separator=' ')
- Update parse_attack(), parse_ability(), _parse_trainer_details() to use it
- Fix JSON encoding in convert_cards.py to use UTF-8 (ensure_ascii=False)
Before: 'Discard an Energy from this Pokémon'
After: 'Discard a Fire Energy from this Pokémon'
Re-scraped all 372 cards and regenerated 382 definitions.
Scraper fixes:
- Detect fossil cards (Helix/Dome Fossil, Old Amber) as Trainer/Item cards
- Add text artifact cleaning for stripped energy icons:
- 'aEnergy' -> 'an Energy'
- 'extraEnergy' -> 'extra Energy'
- 'BenchedPokémon' -> 'Benched Pokémon'
- And 20+ other common patterns
Converter improvements:
- Add evolution chain validation to detect broken evolves_from references
- Track conversion errors and validation warnings in _index.json
- Return errors from convert_set() for better debugging
Data fixes:
- Fixed 4 fossil cards (now correctly typed as trainer/item)
- Fixed text artifacts in 46 raw card files
- Regenerated all 382 card definitions
- All evolution chains now valid
Added fix_raw_text.py utility script for batch text cleanup.