fix: homelab-audit.sh variable interpolation and collector bugs (#23) #34
95
development/ace-step-local-network.md
Normal file
95
development/ace-step-local-network.md
Normal file
@ -0,0 +1,95 @@
|
||||
---
|
||||
title: "ACE-Step 1.5 — Local Network Setup Guide"
|
||||
description: "How to run ACE-Step AI music generator on the local network via Gradio UI or REST API, including .env configuration and startup notes."
|
||||
type: guide
|
||||
domain: development
|
||||
tags: [ace-step, ai, music-generation, gradio, gpu, cuda]
|
||||
---
|
||||
|
||||
# ACE-Step 1.5 — Local Network Setup
|
||||
|
||||
ACE-Step is an open-source AI music generation model. This guide covers running it on the workstation and serving the Gradio web UI to the local network.
|
||||
|
||||
## Location
|
||||
|
||||
```
|
||||
/mnt/NV2/Development/ACE-Step-1.5/
|
||||
```
|
||||
|
||||
Cloned from GitHub. Uses `uv` for dependency management — the `.venv` is created automatically on first run.
|
||||
|
||||
## Quick Start (Gradio UI)
|
||||
|
||||
```bash
|
||||
cd /mnt/NV2/Development/ACE-Step-1.5
|
||||
./start_gradio_ui.sh
|
||||
```
|
||||
|
||||
Accessible from any device on the network at **http://10.10.0.41:7860** (or whatever the workstation IP is).
|
||||
|
||||
## .env Configuration
|
||||
|
||||
The `.env` file in the project root persists settings across git updates. Current config:
|
||||
|
||||
```env
|
||||
SERVER_NAME=0.0.0.0
|
||||
PORT=7860
|
||||
LANGUAGE=en
|
||||
```
|
||||
|
||||
### Key Settings
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `SERVER_NAME` | `127.0.0.1` | Set to `0.0.0.0` for LAN access |
|
||||
| `PORT` | `7860` | Gradio UI port |
|
||||
| `LANGUAGE` | `en` | UI language (`en`, `zh`, `he`, `ja`). **Must be set** — empty value causes `unbound variable` error with the launcher's `set -u` |
|
||||
| `ACESTEP_CONFIG_PATH` | `acestep-v15-turbo` | DiT model variant |
|
||||
| `ACESTEP_LM_MODEL_PATH` | `acestep-5Hz-lm-0.6B` | Language model for lyrics/prompts |
|
||||
| `ACESTEP_INIT_LLM` | `auto` | `auto` / `true` / `false` — auto detects based on VRAM |
|
||||
| `CHECK_UPDATE` | `true` | Set to `false` to skip interactive update prompt (useful for background/automated starts) |
|
||||
|
||||
See `.env.example` for the full list.
|
||||
|
||||
## REST API Server (Alternative)
|
||||
|
||||
For programmatic access instead of the web UI:
|
||||
|
||||
```bash
|
||||
cd /mnt/NV2/Development/ACE-Step-1.5
|
||||
./start_api_server.sh
|
||||
```
|
||||
|
||||
Default: `http://127.0.0.1:8001`. To serve on LAN, edit `start_api_server.sh` line 12:
|
||||
|
||||
```bash
|
||||
HOST="0.0.0.0"
|
||||
```
|
||||
|
||||
API docs available at `http://<ip>:8001/docs`.
|
||||
|
||||
## Hardware Profile (Workstation)
|
||||
|
||||
- **GPU**: NVIDIA RTX 4080 SUPER (16 GB VRAM)
|
||||
- **Tier**: 16GB class — auto-enables CPU offload, INT8 quantization, LLM
|
||||
- **Max batch (with LM)**: 4
|
||||
- **Max batch (without LM)**: 8
|
||||
- **Max duration (with LM)**: 480s (8 min)
|
||||
- **Max duration (without LM)**: 600s (10 min)
|
||||
|
||||
## Startup Behavior
|
||||
|
||||
1. Loads `.env` configuration
|
||||
2. Checks for git updates (interactive prompt — set `CHECK_UPDATE=false` to skip)
|
||||
3. Creates `.venv` via `uv sync` if missing (slow on first run)
|
||||
4. Runs legacy NVIDIA torch compatibility check
|
||||
5. Loads DiT model → quantizes to INT8 → loads LM → allocates KV cache
|
||||
6. Launches Gradio with queue for multi-user support
|
||||
|
||||
Full startup takes ~30-40 seconds after first run.
|
||||
|
||||
## Gotchas
|
||||
|
||||
- **LANGUAGE must be set in `.env`**: The system `$LANGUAGE` locale variable can be empty, causing the launcher to crash with `unbound variable` due to `set -u`. Always include `LANGUAGE=en` in `.env`.
|
||||
- **Update prompt blocks background execution**: If running headlessly or from a script, set `CHECK_UPDATE=false` to avoid the interactive Y/N prompt.
|
||||
- **Model downloads**: First run downloads ~4-5 GB of model weights from HuggingFace. Subsequent runs use cached checkpoints in `./checkpoints/`.
|
||||
80
development/subagent-write-permission-blocked.md
Normal file
80
development/subagent-write-permission-blocked.md
Normal file
@ -0,0 +1,80 @@
|
||||
---
|
||||
title: "Fix: Subagent Write/Edit tools blocked by permission mode mismatch"
|
||||
description: "Claude Code subagents cannot use Write or Edit tools unless spawned with mode: acceptEdits — other permission modes (dontAsk, auto, bypassPermissions) do not grant file-write capability."
|
||||
type: troubleshooting
|
||||
domain: development
|
||||
tags: [troubleshooting, claude-code, permissions, agents, subagents]
|
||||
---
|
||||
|
||||
# Fix: Subagent Write/Edit tools blocked by permission mode mismatch
|
||||
|
||||
**Date:** 2026-03-28
|
||||
**Severity:** Medium — blocks all agent-driven code generation workflows until identified
|
||||
|
||||
## Problem
|
||||
|
||||
When orchestrating multi-agent code generation (spawning engineer agents to write code in parallel), all subagents could Read/Glob/Grep files but Write and Edit tool calls were silently denied. Agents would complete their analysis, prepare the full file content, then report "blocked on Write/Edit permission."
|
||||
|
||||
This happened across **every** permission mode tried:
|
||||
- `mode: bypassPermissions` — denied (with worktree isolation)
|
||||
- `mode: auto` — denied (with and without worktree isolation)
|
||||
- `mode: dontAsk` — denied (with and without worktree isolation)
|
||||
|
||||
## Root Cause
|
||||
|
||||
Claude Code's Agent tool has multiple permission modes that control different things:
|
||||
|
||||
| Mode | What it controls | Grants Write/Edit? |
|
||||
|------|-----------------|-------------------|
|
||||
| `default` | User prompted for each tool call | No — user must approve each |
|
||||
| `dontAsk` | Suppresses user prompts | **No** — suppresses prompts but doesn't grant capability |
|
||||
| `auto` | Auto-approves based on context | **No** — same issue |
|
||||
| `bypassPermissions` | Skips permission-manager hooks | **No** — only bypasses plugin hooks, not tool-level gates |
|
||||
| `acceptEdits` | Grants file modification capability | **Yes** — this is the correct mode |
|
||||
|
||||
The key distinction: `dontAsk`/`auto`/`bypassPermissions` control the **user-facing permission prompt** (whether the user gets asked to approve). But Write/Edit tools have an **internal capability gate** that checks whether the agent was explicitly authorized to modify files. Only `acceptEdits` provides that authorization.
|
||||
|
||||
## Additional Complication: permission-manager plugin
|
||||
|
||||
The `permission-manager@agent-toolkit` plugin (`cmd-gate` PreToolUse hook) adds a second layer that blocks Bash-based file writes (output redirection `>`, `tee`, etc.). When agents fell back to Bash after Write/Edit failed, the plugin caught those too.
|
||||
|
||||
- `bypassPermissions` mode is documented to skip cmd-gate entirely, but this didn't work reliably in worktree isolation
|
||||
- Disabling the plugin (`/plugin` → toggle off `permission-manager@agent-toolkit`, then `/reload-plugins`) removed the Bash-level blocks but did NOT fix Write/Edit
|
||||
|
||||
## Fix
|
||||
|
||||
**Use `mode: acceptEdits`** when spawning any agent that needs to create or modify files:
|
||||
|
||||
```
|
||||
Agent(
|
||||
subagent_type="engineer",
|
||||
mode="acceptEdits", # <-- This is the critical setting
|
||||
prompt="..."
|
||||
)
|
||||
```
|
||||
|
||||
**Additional recommendations:**
|
||||
- Worktree isolation (`isolation: "worktree"`) may compound permission issues — avoid it unless the agents genuinely need isolation (e.g., conflicting file edits)
|
||||
- For agents that only read (reviewers, validators), any mode works
|
||||
- If the permission-manager plugin is also blocking Bash fallbacks, disable it temporarily or add classifiers for the specific commands needed
|
||||
|
||||
## Reproduction
|
||||
|
||||
1. Spawn an engineer agent with `mode: dontAsk` and a prompt to create a new file
|
||||
2. Agent will Read reference files successfully, prepare content, then report Write tool denied
|
||||
3. Change to `mode: acceptEdits` — same prompt succeeds immediately
|
||||
|
||||
## Environment
|
||||
|
||||
- Claude Code CLI on Linux (Nobara/Fedora)
|
||||
- Plugins: permission-manager@agent-toolkit (St0nefish/agent-toolkit)
|
||||
- Agent types tested: engineer, general-purpose
|
||||
- Models tested: sonnet subagents
|
||||
|
||||
## Lessons
|
||||
|
||||
- **Always use `acceptEdits` for code-writing agents.** The mode name is the clue — it's not just "accepting" edits from the user, it's granting the agent the capability to make edits.
|
||||
- **`dontAsk` ≠ "can do anything."** It means "don't prompt the user" — but the capability to write files is a separate authorization layer.
|
||||
- **Test agent permissions early.** When building a multi-agent orchestration workflow, verify the first agent can actually write before launching a full wave. A quick single-file test agent saves time.
|
||||
- **Worktree isolation adds complexity.** Only use it when agents would genuinely conflict on the same files. For non-overlapping file changes, skip isolation.
|
||||
- **The permission-manager plugin is a separate concern.** It blocks Bash file-write commands (>, tee, cat heredoc). Disabling it fixes Bash fallbacks but not Write/Edit tool calls. Both layers must be addressed independently.
|
||||
50
gaming/release-2026.4.02.md
Normal file
50
gaming/release-2026.4.02.md
Normal file
@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "MLB The Show Grind — 2026.4.02"
|
||||
description: "Pack opening command, full cycle orchestrator, keyboard dismiss fix, package split."
|
||||
type: reference
|
||||
domain: gaming
|
||||
tags: [release-notes, deployment, mlb-the-show, automation]
|
||||
---
|
||||
|
||||
# MLB The Show Grind — 2026.4.02
|
||||
|
||||
**Date:** 2026-04-02
|
||||
**Project:** mlb-the-show (`/mnt/NV2/Development/mlb-the-show`)
|
||||
|
||||
## Release Summary
|
||||
|
||||
Added pack opening automation and a full buy→exchange→open cycle command. Fixed a critical bug where KEYCODE_BACK was closing the buy order modal instead of dismissing the keyboard, preventing all order placement. Split the 1600-line single-file script into a proper Python package.
|
||||
|
||||
## Changes
|
||||
|
||||
### New Features
|
||||
- **`open-packs` command** — navigates to My Packs, finds the target pack by name (default: Exchange - Live Series Gold), rapid-taps Open Next at ~0.3s/pack with periodic verification
|
||||
- **`cycle` command** — full orchestrated flow: buy silvers for specified OVR tiers → exchange all dupes into gold packs → open all gold packs
|
||||
- **`DEFAULT_PACK_NAME` constant** — `"Exchange - Live Series Gold"` extracted from inline strings
|
||||
|
||||
### Bug Fixes
|
||||
- **Keyboard dismiss fix** — `KEYCODE_BACK` was closing the entire buy order modal instead of just dismissing the numeric keyboard. Replaced with `tap(540, 900)` to tap a neutral area. This was the root cause of all buy orders silently failing (0 orders placed despite cards having room).
|
||||
- **`full_cycle` passed no args to `open_packs()`** — now passes `packs_exchanged` count to bound the open loop
|
||||
- **`isinstance(result, dict)` dead code** removed from `full_cycle` — `grind_exchange` always returns `int`
|
||||
- **`_find_nearest_open_button`** — added x-column constraint (200px) and zero-width element filtering to prevent matching ghost buttons from collapsed packs
|
||||
|
||||
### Refactoring
|
||||
- **Package split** — `scripts/grind.py` (1611 lines) → `scripts/grind/` package:
|
||||
- `constants.py` (104 lines) — coordinates, price gates, UI element maps
|
||||
- `adb_utils.py` (125 lines) — ADB shell, tap, swipe, dump_ui, element finders
|
||||
- `navigation.py` (107 lines) — screen navigation (nav_to, nav_tab, FAB)
|
||||
- `exchange.py` (283 lines) — gold exchange logic
|
||||
- `market.py` (469 lines) — market scanning and buy order placement
|
||||
- `packs.py` (131 lines) — pack opening
|
||||
- `__main__.py` (390 lines) — CLI entry point and orchestrators (grind_loop, full_cycle)
|
||||
- `scripts/grind.py` retained as a thin wrapper for `uv run` backward compatibility
|
||||
- Invocation changed from `uv run scripts/grind.py` to `PYTHONPATH=scripts python3 -m grind`
|
||||
- Raw `adb("input swipe ...")` calls replaced with `swipe()` helper
|
||||
|
||||
## Session Stats
|
||||
|
||||
- **Buy orders placed:** 532 orders across two runs (474 + 58)
|
||||
- **Stubs spent:** ~63,655
|
||||
- **Gold packs exchanged:** 155 (94 + 61)
|
||||
- **Gold packs opened:** 275
|
||||
- **OVR tiers worked:** 77 (primary), 78 (all above max price)
|
||||
@ -214,6 +214,58 @@ For full HDR setup (vk-hdr-layer, KDE config, per-API env vars), see the **steam
|
||||
|
||||
**Diagnostic tip**: Look for rapid retry patterns in Pi-hole logs (same domain queried every 1-3s from the Xbox IP) — this signals a blocked domain causing timeout loops.
|
||||
|
||||
## Gray Zone Warfare — EAC Failures on Proton (2026-03-31) [RESOLVED]
|
||||
|
||||
**Severity:** High — game unplayable online
|
||||
**Status:** RESOLVED — corrupted prebuild world cache file
|
||||
|
||||
**Problem:** EAC errors when connecting to servers on Linux/Proton. Three error codes observed across attempts:
|
||||
- `0x0002000A` — "The client failed an anti-cheat client runtime check" (the actual root cause)
|
||||
- `0x0002000F` — "The client failed to register in time" (downstream timeout)
|
||||
- `0x00020011` — "The client failed to start the session" (downstream session failure)
|
||||
|
||||
Game launches fine, EAC bootstrapper reports success, but fails when joining a server at "Synchronizing Live Data".
|
||||
|
||||
**Root Cause:** A corrupted/stale prebuild world cache file that EAC flagged during runtime checks:
|
||||
```
|
||||
LogEOSAntiCheat: [AntiCheatClient] [PollStatusInternal] Client Violation with Type: 5
|
||||
Message: Unknown file version (GZW/Content/SKALLA/PrebuildWorldData/World/cache/0xb9af63cee2e43b6c_0x3cb3b3354fb31606.dat)
|
||||
```
|
||||
EAC scanned this file, found an unrecognized version, and flagged a client violation. The other errors (`0x0002000F`, `0x00020011`) were downstream consequences — EAC couldn't complete session registration after the violation.
|
||||
|
||||
Compounding factors that made diagnosis harder:
|
||||
- Epic EOS scheduled maintenance (Fortnite v40.10, Apr 1 08:00-09:30 UTC) returned 503s from `api.epicgames.dev/auth/v1/oauth/token`, masking the real issue
|
||||
- `steam_api64.dll` EOS SDK errors at startup are **benign noise** under Proton — red herring
|
||||
- Nuking the compatdata prefix and upgrading Proton happened concurrently, adding confusion
|
||||
|
||||
**Fix:**
|
||||
1. Delete the specific cache file: `rm "GZW/Content/SKALLA/PrebuildWorldData/World/cache/0xb9af63cee2e43b6c_0x3cb3b3354fb31606.dat"`
|
||||
2. Verify game files in Steam — Steam redownloads a fresh copy with different hash
|
||||
3. Launch game — clean logs, no EAC errors
|
||||
|
||||
Key detail: the file was the same size (60.7MB) before and after, but different md5 hash — Steam's verify replaced it with a corrected version.
|
||||
|
||||
**Log locations:**
|
||||
- EAC bootstrapper: `compatdata/2479810/pfx/drive_c/users/steamuser/AppData/Roaming/EasyAntiCheat/.../anticheatlauncher.log`
|
||||
- Game log: `compatdata/2479810/pfx/drive_c/users/steamuser/AppData/Local/GZW/Saved/Logs/GZW.log`
|
||||
- STL launch log: `~/.config/steamtinkerlaunch/logs/gamelaunch/id/2479810.log`
|
||||
|
||||
**What did NOT fix it (for reference):**
|
||||
1. Installing Proton EasyAntiCheat Runtime (AppID 1826330) — good to have but not the issue
|
||||
2. Deleting the entire cache directory without re-verifying — Steam verify re-downloaded the same bad file the first time (20 files fixed); needed a second targeted delete + verify
|
||||
3. Nuking compatdata prefix for clean rebuild
|
||||
4. Switching Proton versions (GE-Proton9-25 ↔ GE-Proton10-25)
|
||||
|
||||
**Lessons:**
|
||||
- When EAC logs show "Unknown file version" for a specific `.dat` file, delete that file and verify — don't nuke the whole cache or prefix
|
||||
- `steam_api64.dll` EOS errors are benign under Proton and not related to EAC failures
|
||||
- Check Epic's status page for scheduled maintenance before deep-diving Proton issues
|
||||
- Multiple verify-and-fix cycles may be needed — the first verify can redownload a stale cached version from Steam's CDN
|
||||
|
||||
**Game version:** 0.4.0.0-231948-H (EA Pre-Alpha)
|
||||
**Working Proton:** GE-Proton10-25
|
||||
**STL config:** `~/.config/steamtinkerlaunch/gamecfgs/id/2479810.conf`
|
||||
|
||||
## Useful Commands
|
||||
|
||||
### Check Running Game Process
|
||||
|
||||
34
major-domo/database-release-2026.4.1.md
Normal file
34
major-domo/database-release-2026.4.1.md
Normal file
@ -0,0 +1,34 @@
|
||||
---
|
||||
title: "Database API Release — 2026.4.1"
|
||||
description: "Query limit caps to prevent worker timeouts, plus hotfix to exempt /players endpoint."
|
||||
type: reference
|
||||
domain: major-domo
|
||||
tags: [release-notes, deployment, database, hotfix]
|
||||
---
|
||||
|
||||
# Database API Release — 2026.4.1
|
||||
|
||||
**Date:** 2026-04-01
|
||||
**Tag:** `2026.3.7` + 3 post-tag commits (CI auto-generates CalVer on merge)
|
||||
**Image:** `manticorum67/major-domo-database`
|
||||
**Server:** akamai (`~/container-data/sba-database`)
|
||||
**Deploy method:** `docker compose pull && docker compose down && docker compose up -d`
|
||||
|
||||
## Release Summary
|
||||
|
||||
Added bounded pagination (`MAX_LIMIT=500`, `DEFAULT_LIMIT=200`) to all list endpoints to prevent Gunicorn worker timeouts caused by unbounded queries. Two follow-up fixes corrected response `count` fields in fieldingstats that were computed after the limit was applied. A hotfix (PR #103) then removed the caps from the `/players` endpoint specifically, since the bot and website depend on fetching full player lists.
|
||||
|
||||
## Changes
|
||||
|
||||
### Bug Fixes
|
||||
- **PR #99** — Fix unbounded API queries causing Gunicorn worker timeouts. Added `MAX_LIMIT=500` and `DEFAULT_LIMIT=200` constants in `dependencies.py`, enforced `le=MAX_LIMIT` on all list endpoints. Added middleware to strip empty query params preventing validation bypass.
|
||||
- **PR #100** — Fix fieldingstats `get_fieldingstats` count: captured `total_count` before `.limit()` so the response reflects total rows, not page size.
|
||||
- **PR #101** — Fix fieldingstats `get_totalstats`: removed line that overwrote `count` with `len(page)` after it was correctly set from `total_count`.
|
||||
|
||||
### Hotfix
|
||||
- **PR #103** — Remove output caps from `GET /api/v3/players`. Reverted `limit` param to `Optional[int] = Query(default=None, ge=1)` (no ceiling). The `/players` table is a bounded dataset (~1500 rows/season) and consumers depend on uncapped results. All other endpoints retain their caps.
|
||||
|
||||
## Deployment Notes
|
||||
- No migrations required
|
||||
- No config changes
|
||||
- Rollback: `docker compose pull manticorum67/major-domo-database:<previous-tag> && docker compose down && docker compose up -d`
|
||||
38
major-domo/release-2026.3.31-2.md
Normal file
38
major-domo/release-2026.3.31-2.md
Normal file
@ -0,0 +1,38 @@
|
||||
---
|
||||
title: "Discord Bot Release — 2026.3.13"
|
||||
description: "Enforce free agency lock deadline — block /dropadd FA pickups after week 14, plus performance batch from backlog issues."
|
||||
type: reference
|
||||
domain: major-domo
|
||||
tags: [release-notes, deployment, discord, major-domo]
|
||||
---
|
||||
|
||||
# Discord Bot Release — 2026.3.13
|
||||
|
||||
**Date:** 2026-03-31
|
||||
**Tag:** `2026.3.13`
|
||||
**Image:** `manticorum67/major-domo-discordapp:2026.3.13` / `:production`
|
||||
**Server:** akamai (`~/container-data/major-domo`)
|
||||
**Deploy method:** `.scripts/deploy.sh -y` (docker compose pull + up)
|
||||
|
||||
## Release Summary
|
||||
|
||||
Enforces the previously unused `fa_lock_week` config (week 14) in the transaction builder. After the deadline, `/dropadd` blocks adding players FROM Free Agency while still allowing drops TO FA. Also includes a batch of performance PRs from the backlog that were merged between 2026.3.12 and this tag.
|
||||
|
||||
## Changes
|
||||
|
||||
### New Features
|
||||
- **Free agency lock enforcement** — `TransactionBuilder.add_move()` now checks `current_week >= fa_lock_week` and rejects FA pickups after the deadline. Dropping to FA remains allowed. Config already existed at `fa_lock_week = 14` but was never enforced. (PR #122)
|
||||
|
||||
### Performance
|
||||
- Eliminate redundant API calls in trade views (PR #116, issue #94)
|
||||
- Eliminate redundant GET after create/update and parallelize stats (PR #112, issue #95)
|
||||
- Parallelize N+1 player/creator lookups with `asyncio.gather()` (PR #118, issue #89)
|
||||
- Consolidate duplicate `league_service.get_current_state()` calls in `add_move()` into a single shared fetch (PR #122)
|
||||
|
||||
### Bug Fixes
|
||||
- Fix race condition: use per-user dict for `_checked_teams` in trade views (PR #116)
|
||||
|
||||
## Deployment Notes
|
||||
- No migrations required
|
||||
- No config changes needed — `fa_lock_week = 14` already existed in config
|
||||
- Rollback: `ssh akamai "cd ~/container-data/major-domo && docker pull manticorum67/major-domo-discordapp@sha256:94d59135f127d5863b142136aeeec9d63b06ee63e214ef59f803cedbd92b473e && docker tag manticorum67/major-domo-discordapp@sha256:94d59135f127d5863b142136aeeec9d63b06ee63e214ef59f803cedbd92b473e manticorum67/major-domo-discordapp:production && docker compose up -d discord-app"`
|
||||
86
major-domo/release-2026.3.31.md
Normal file
86
major-domo/release-2026.3.31.md
Normal file
@ -0,0 +1,86 @@
|
||||
---
|
||||
title: "Discord Bot Release — 2026.3.12"
|
||||
description: "Major catch-up release: trade deadline enforcement, performance parallelization, security fixes, CI/CD migration to CalVer, and 148 commits of accumulated improvements."
|
||||
type: reference
|
||||
domain: major-domo
|
||||
tags: [release-notes, deployment, discord, major-domo]
|
||||
---
|
||||
|
||||
# Discord Bot Release — 2026.3.12
|
||||
|
||||
**Date:** 2026-03-31
|
||||
**Tag:** `2026.3.12`
|
||||
**Image:** `manticorum67/major-domo-discordapp:2026.3.12` / `:production`
|
||||
**Server:** akamai (`~/container-data/major-domo`)
|
||||
**Deploy method:** `.scripts/deploy.sh -y` (docker compose pull + up)
|
||||
**Previous tag:** `v2.29.4` (148 commits behind)
|
||||
|
||||
## Release Summary
|
||||
|
||||
Large catch-up release covering months of accumulated work since the last tag. The headline feature is trade deadline enforcement — `/trade` commands are now blocked after the configured deadline week, with fail-closed behavior when API data is unavailable. Also includes significant performance improvements (parallelized API calls, cached signatures, Redis SCAN), security hardening, dependency pinning, and a full CI/CD migration from version-file bumps to CalVer tag-triggered builds.
|
||||
|
||||
## Changes
|
||||
|
||||
### New Features
|
||||
- **Trade deadline enforcement** — `is_past_trade_deadline` property on Current model; guards on `/trade initiate`, submit button, and `_finalize_trade`. Fail-closed when API returns no data. 4 new tests. (PR #121)
|
||||
- `is_admin()` helper in `utils/permissions.py` (#55)
|
||||
- Team ownership verification on `/injury set-new` and `/injury clear` (#18)
|
||||
- Current week number included in weekly-info channel posts
|
||||
- Local deploy script for production deploys
|
||||
|
||||
### Performance
|
||||
- Parallelize independent API calls with `asyncio.gather()` (#90)
|
||||
- Cache `inspect.signature()` at decoration time (#97)
|
||||
- Replace `json.dumps` serialization test with `isinstance` fast path (#96)
|
||||
- Use `channel.purge()` instead of per-message delete loops (#93)
|
||||
- Parallelize schedule_service week fetches (#88)
|
||||
- Replace Redis `KEYS` with `SCAN` in `clear_prefix` (#98)
|
||||
- Reuse persistent `aiohttp.ClientSession` in GiphyService (#26)
|
||||
- Cache user team lookup in player_autocomplete, reduce limit to 25
|
||||
|
||||
### Bug Fixes
|
||||
- Fix chart_service path from `data/` to `storage/`
|
||||
- Make ScorecardTracker methods async to match await callers
|
||||
- Prevent partial DB writes and show detailed errors on scorecard submission failure
|
||||
- Add trailing slashes to API URLs to prevent 307 redirects dropping POST bodies
|
||||
- Trade validation: check against next week's projected roster, include pending trades and org affiliate transactions
|
||||
- Prefix trade validation errors with team abbreviation
|
||||
- Auto-detect player roster type in trade commands instead of assuming ML
|
||||
- Fix key plays score text ("tied at X" instead of "Team up X-X") (#48)
|
||||
- Fix scorebug stale data, win probability parsing, and read-failure tolerance (#39, #40)
|
||||
- Batch quick-wins: 4 issues resolved (#37, #27, #25, #38)
|
||||
- Fix ContextualLogger crash when callers pass `exc_info=True`
|
||||
- Fix thaw report posting to use channel ID instead of channel names
|
||||
- Use explicit America/Chicago timezone for freeze/thaw scheduling
|
||||
- Replace broken `@self.tree.interaction_check` with MaintenanceAwareTree subclass
|
||||
- Implement actual maintenance mode flag in `/admin-maintenance` (#28)
|
||||
- Validate and sanitize pitching decision data from Google Sheets
|
||||
- Fix `/player` autocomplete timeout by using current season only
|
||||
- Split read-only data volume to allow state file writes (#85)
|
||||
- Update roster labels to use Minor League and Injured List (#59)
|
||||
|
||||
### Security
|
||||
- Address 7 security issues across the codebase
|
||||
- Remove 226 unused imports (#33)
|
||||
- Pin all Python dependency versions in `requirements.txt` (#76)
|
||||
|
||||
### Refactoring & Cleanup
|
||||
- Extract duplicate command hash logic into `_compute_command_hash` (#31)
|
||||
- Move 42 unnecessary lazy imports to top-level
|
||||
- Remove dead maintenance mode artifacts in bot.py (#104)
|
||||
- Remove unused `weeks_ahead` parameter from `get_upcoming_games`
|
||||
- Invalidate roster cache after submission instead of force-refreshing
|
||||
|
||||
## Infrastructure Changes
|
||||
- **CI/CD migration**: Switched from version-file bumps to CalVer tag-triggered Docker builds
|
||||
- Added `.scripts/release.sh` for creating CalVer tags
|
||||
- Updated `.scripts/deploy.sh` for tag-triggered releases
|
||||
- Docker build cache switched from `type=gha` to `type=registry`
|
||||
- Used `docker-tags` composite action for multi-channel release support
|
||||
- Fixed act_runner auth with short-form local actions + full GitHub URLs
|
||||
- Use Gitea API for tag creation to avoid branch protection failures
|
||||
|
||||
## Deployment Notes
|
||||
- No migrations required
|
||||
- No config changes needed
|
||||
- Rollback: `ssh akamai "cd ~/container-data/major-domo && docker pull manticorum67/major-domo-discordapp@<previous-digest> && docker tag <digest> manticorum67/major-domo-discordapp:production && docker compose up -d discord-app"`
|
||||
59
major-domo/troubleshooting-gunicorn-worker-timeouts.md
Normal file
59
major-domo/troubleshooting-gunicorn-worker-timeouts.md
Normal file
@ -0,0 +1,59 @@
|
||||
---
|
||||
title: "Fix: Gunicorn Worker Timeouts from Unbounded API Queries"
|
||||
description: "External clients sent limit=99999 and empty filter params through the reverse proxy, causing API workers to timeout and get killed."
|
||||
type: troubleshooting
|
||||
domain: major-domo
|
||||
tags: [troubleshooting, major-domo, database, deployment, docker]
|
||||
---
|
||||
|
||||
# Fix: Gunicorn Worker Timeouts from Unbounded API Queries
|
||||
|
||||
**Date:** 2026-04-01
|
||||
**PR:** cal/major-domo-database#99
|
||||
**Issues:** #98 (main), #100 (fieldingstats count bug), #101 (totalstats count overwrite, pre-existing)
|
||||
**Severity:** Critical — active production instability during Season 12, 12 worker timeouts in 2 days and accelerating
|
||||
|
||||
## Problem
|
||||
|
||||
The monitoring app kept flagging the SBA API container (`sba_db_api`) as unhealthy and restarting it. Container logs showed repeated `CRITICAL WORKER TIMEOUT` and `WARNING Worker was sent SIGABRT` messages from Gunicorn. The container itself wasn't restarting (0 Docker restarts, up 2 weeks), but individual workers were being killed and respawned, causing brief API unavailability windows.
|
||||
|
||||
## Root Cause
|
||||
|
||||
External clients (via nginx-proxy-manager at `172.25.0.3`) were sending requests with `limit=99999` and empty filter parameters (e.g., `?game_id=&pitcher_id=`). The API had no defenses:
|
||||
|
||||
- **No max limit cap** on any endpoint except `/players/search` (which had `le=50`). Clients could request 99,999 rows.
|
||||
- **Empty string params passed validation** — FastAPI parsed `game_id=` as `['']`, which passed `if param is not None` checks but generated wasteful full-table-scan queries.
|
||||
- **`/transactions` had no limit parameter at all** — always returned every matching row with recursive serialization (`model_to_dict(recurse=True)`).
|
||||
- **Recursive serialization amplified cost** — each row triggered additional DB lookups for FK relations (player, team, etc.).
|
||||
|
||||
Combined, these caused queries to exceed the 120-second Gunicorn timeout, killing the worker.
|
||||
|
||||
### IP Attribution Gotcha
|
||||
|
||||
Initial assumption was the Discord bot was the source (IP `172.25.0.3` was assumed to be the bot container). Docker IP mapping revealed `172.25.0.3` was actually **nginx-proxy-manager** — the queries came from external clients through the reverse proxy. The Discord bot is at `172.18.0.2` on a completely separate Docker network and generates none of these queries.
|
||||
|
||||
```bash
|
||||
# Command to map container IPs
|
||||
docker inspect --format='{{.Name}} {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q)
|
||||
```
|
||||
|
||||
## Fix
|
||||
|
||||
PR #99 merged into main with the following changes (27 files, 503 insertions):
|
||||
|
||||
1. **`MAX_LIMIT=500` and `DEFAULT_LIMIT=200` constants** in `app/dependencies.py`, enforced with `le=MAX_LIMIT` across all list endpoints
|
||||
2. **`strip_empty_query_params` middleware** in `app/main.py` — strips empty string values from query params before FastAPI parses them, so `?game_id=` is treated as absent
|
||||
3. **`limit`/`offset` added to `/transactions`** — previously returned all rows; now defaults to 200, max 500, with `total_count` computed before pagination
|
||||
4. **11 existing limit params capped** with `le=MAX_LIMIT`
|
||||
5. **13 endpoints with no limit** received `limit`/`offset` params
|
||||
6. **Manual `if limit < 1` guards removed** — now handled by FastAPI's `ge=1` validation
|
||||
7. **5 unit tests** covering limit validation (422 on exceeding max, zero, negative), transaction response shape, and empty string stripping
|
||||
8. **fieldingstats count bug fixed** — `.count()` was being called after `.limit()`, capping the reported count at the page size instead of total matching rows (#100)
|
||||
|
||||
## Lessons
|
||||
|
||||
- **Always verify container IP attribution** before investigating the wrong service. `docker inspect` with format string is the canonical way to map IPs to container names. Don't assume based on Docker network proximity.
|
||||
- **APIs should never trust client-provided limits** — enforce `le=MAX_LIMIT` on every list endpoint. The only safe endpoint was `/players/search` which had been properly capped at `le=50`.
|
||||
- **Empty string params are a silent danger** — FastAPI parses `?param=` as `['']`, not `None`. A global middleware is the right fix since it protects all endpoints including future ones.
|
||||
- **Recursive serialization (`model_to_dict(recurse=True)`) is O(n * related_objects)** — dangerous on unbounded queries. Consider forcing `short_output=True` for large result sets.
|
||||
- **Heavy reformatting mixed with functional changes obscures bugs** — the fieldingstats count bug was missed in review because the file had 262 lines of diff from quote/formatting changes. Separate cosmetic and functional changes into different commits.
|
||||
@ -562,6 +562,20 @@ tar -czf ~/jellyfin-config-backup-$(date +%Y%m%d).tar.gz ~/docker/jellyfin/confi
|
||||
|
||||
---
|
||||
|
||||
## PGS Subtitle Default Flags Causing Roku Playback Hang (2026-04-01)
|
||||
|
||||
**Severity:** Medium — affects all Roku/Apple TV clients attempting to play remuxes with PGS subtitles
|
||||
|
||||
**Problem:** Playback on Roku hangs at "Loading" and stops at 0 ms. Jellyfin logs show ffmpeg extracting all subtitle streams (including PGS) from the full-length movie before playback can begin. User Staci reported Jurassic Park (1993) taking forever to start on the living room Roku.
|
||||
|
||||
**Root Cause:** PGS (hdmv_pgs_subtitle) tracks flagged as `default` in MKV files cause the Roku client to auto-select them. Roku can't decode PGS natively, so Jellyfin must burn them in — triggering a full subtitle extraction pass and video transcode before any data reaches the client. 178 out of ~400 movies in the library had this flag set, mostly remuxes that predate the Tdarr `clrSubDef` flow plugin.
|
||||
|
||||
**Fix:**
|
||||
1. **Batch fix (existing library):** Wrote `fix-pgs-defaults.sh` — scans all MKVs with `mkvmerge -J`, finds PGS tracks with `default_track: true`, clears via `mkvpropedit --edit track:N --set flag-default=0`. Key gotcha: mkvpropedit uses 1-indexed track numbers (`track_id + 1`), NOT `track:=ID` (which matches by UID). Script is on manticore at `/tmp/fix-pgs-defaults.sh`. Fixed 178 files, no re-encoding needed.
|
||||
2. **Going forward (Tdarr):** The flow already has a "Clear Subtitle Default Flags" custom function plugin (`clrSubDef`) that clears default disposition on non-forced subtitle tracks during transcoding. New files processed by Tdarr are handled automatically.
|
||||
|
||||
**Lesson:** Remux files from automated downloaders almost always have PGS defaults set. Any bulk import of remuxes should be followed by a PGS default flag sweep. The CIFS media mount on manticore is read-only inside the Jellyfin container — mkvpropedit must run from the host against `/mnt/truenas/media/Movies`.
|
||||
|
||||
## Related Documentation
|
||||
- **Setup Guide**: `/media-servers/jellyfin-ubuntu-manticore.md`
|
||||
- **NVIDIA Driver Management**: See jellyfin-ubuntu-manticore.md
|
||||
|
||||
37
mlb-the-show/release-2026.3.28.md
Normal file
37
mlb-the-show/release-2026.3.28.md
Normal file
@ -0,0 +1,37 @@
|
||||
---
|
||||
title: "MLB The Show Market Tracker — 0.1.0"
|
||||
description: "Initial release of the CLI market scanner with flip scanning and exchange program support."
|
||||
type: reference
|
||||
domain: gaming
|
||||
tags: [release-notes, deployment, mlb-the-show, rust]
|
||||
---
|
||||
|
||||
# MLB The Show Market Tracker — 0.1.0
|
||||
|
||||
**Date:** 2026-03-28
|
||||
**Version:** `0.1.0`
|
||||
**Repo:** `cal/mlb-the-show-market-tracker` on Gitea
|
||||
**Deploy method:** Local CLI tool — `cargo build --release` on workstation
|
||||
|
||||
## Release Summary
|
||||
|
||||
Initial release of `showflip`, a Rust CLI tool for scanning the MLB The Show 26 Community Market. Supports finding profitable card flips and identifying silver cards at target buy-order prices for the gold pack exchange program.
|
||||
|
||||
## Changes
|
||||
|
||||
### New Features
|
||||
|
||||
- **`scan` command** — Concurrent market scanner that finds profitable flip opportunities. Supports filters for rarity, team, position, budget, and sorting by profit/margin. Includes watch mode for repeated scans and optional Discord webhook alerts.
|
||||
- **`exchange` command** — Scans for silver cards (OVR 77-79) priced within configurable buy-order gates for the gold pack exchange program. Tiers: 79 OVR (target 170/max 175), 78 OVR (target 140/max 145), 77 OVR (target 117/max 122). Groups results by OVR with color-coded target/OK status.
|
||||
- **`detail` command** — Shows price history and recent sales for a specific card by name or UUID.
|
||||
- **`meta` command** — Lists available series, brands, and sets for use as filter values.
|
||||
- OVR-based price floor calculation for live and non-live series cards
|
||||
- 10% Community Market tax built into all profit calculations
|
||||
- Handles API price format inconsistencies (integers vs comma-formatted strings)
|
||||
- HTTP client with 429 retry handling
|
||||
|
||||
## Deployment Notes
|
||||
|
||||
- No server deployment — runs locally via `cargo run -- <subcommand>`
|
||||
- API is public at `https://mlb26.theshow.com/apis/` — no auth required
|
||||
- No tests or CI configured yet
|
||||
45
mlb-the-show/release-2026.3.31.md
Normal file
45
mlb-the-show/release-2026.3.31.md
Normal file
@ -0,0 +1,45 @@
|
||||
---
|
||||
title: "MLB The Show Companion Automation — 2026.3.31"
|
||||
description: "Fix gold exchange navigation, add grind harness for automated buy→exchange loops, CLI cleanup."
|
||||
type: reference
|
||||
domain: gaming
|
||||
tags: [release-notes, deployment, mlb-the-show, python, automation]
|
||||
---
|
||||
|
||||
# MLB The Show Companion Automation — 2026.3.31
|
||||
|
||||
**Date:** 2026-03-31
|
||||
**Repo:** `cal/mlb-the-show-market-tracker` on Gitea
|
||||
**Branch:** `main` (merge commit `ea66e2c`)
|
||||
**Deploy method:** Local script — `uv run scripts/grind.py`
|
||||
|
||||
## Release Summary
|
||||
|
||||
Major fixes to the companion app automation (`grind.py`). The gold exchange navigation was broken — the script thought it had entered the card grid when it was still on the exchange selection list. Added a new `grind` command that orchestrates the full buy→exchange loop with multi-tier OVR rotation.
|
||||
|
||||
## Changes
|
||||
|
||||
### Bug Fixes
|
||||
- Fixed `_is_on_exchange_grid()` to require `Exchange Value` card labels, distinguishing the card grid from the Exchange Players list page (`d4c038b`)
|
||||
- Added retry loop (3 attempts, 2s apart) in `ensure_on_exchange_grid()` for variable load times
|
||||
- Added `time.sleep(2)` after tapping into the Gold Exchange grid
|
||||
- Removed low-OVR bail logic — the grid is sorted ascending, so bail fired on first screen before scrolling to profitable cards
|
||||
- Fixed buy-orders market scroll — retry loop attempts up to 10 scrolls before giving up (was 1) (`6912a7e`). Note: scroll method itself was still broken (KEYCODE_PAGE_DOWN); fixed in 2026.4.01 release.
|
||||
- Restored `_has_low_ovr_cards` fix lost during PR #2 merge (`c29af78`)
|
||||
|
||||
### New Features
|
||||
- **`grind` command** — automated buy→exchange loop with OVR tier rotation (`6912a7e`)
|
||||
- Rotates through OVR tiers in descending order (default: 79, 78, 77)
|
||||
- Buys 2 tiers per round, then exchanges all available dupes
|
||||
- Flags: `--ovrs`, `--rounds`, `--max-players`, `--max-price`, `--budget`, `--max-packs`
|
||||
- Per-round and cumulative summary output
|
||||
- Clean Ctrl+C handling with final totals
|
||||
|
||||
### CLI Changes
|
||||
- Renamed `grind` → `exchange` (bulk exchange command)
|
||||
- Removed redundant single-exchange command (use `exchange 1` instead)
|
||||
- `grind` now refers to the full buy→exchange orchestration loop
|
||||
|
||||
## Known Issues
|
||||
- Default price gates (`MAX_BUY_PRICES`) may be too low during market inflation periods. Current gates: 79→170, 78→140, 77→125. Use `--max-price` to override.
|
||||
- No order fulfillment polling — the grind loop relies on natural timing (2 buy rounds ≈ 2-5 min gives orders time to fill)
|
||||
26
mlb-the-show/release-2026.4.01.md
Normal file
26
mlb-the-show/release-2026.4.01.md
Normal file
@ -0,0 +1,26 @@
|
||||
---
|
||||
title: "MLB The Show Companion Automation — 2026.4.01"
|
||||
description: "Fix buy-orders scroll to use touch swipes, optimize exchange card selection."
|
||||
type: reference
|
||||
domain: gaming
|
||||
tags: [release-notes, deployment, mlb-the-show, python, automation]
|
||||
---
|
||||
|
||||
# MLB The Show Companion Automation — 2026.4.01
|
||||
|
||||
**Date:** 2026-04-01
|
||||
**Repo:** `cal/mlb-the-show-market-tracker` on Gitea
|
||||
**Branch:** `main` (latest `f15e98a`)
|
||||
**Deploy method:** Local script — `uv run scripts/grind.py`
|
||||
|
||||
## Release Summary
|
||||
|
||||
Two fixes to the companion app automation. The buy-orders command couldn't scroll through the market list because it used keyboard events instead of touch swipes. The exchange command now stops selecting cards once it has enough points for a pack.
|
||||
|
||||
## Changes
|
||||
|
||||
### Bug Fixes
|
||||
- **Fixed buy-orders market scrolling** — replaced `KEYCODE_PAGE_DOWN` (keyboard event ignored by WebView) with `scroll_load_jiggle()` which uses touch swipes + a reverse micro-swipe to trigger lazy loading. This matches the working exchange scroll strategy. (`49fe7b6`)
|
||||
|
||||
### Optimizations
|
||||
- **Early break in exchange card selection** — the selection loop now stops as soon as accumulated points meet the exchange threshold, avoiding unnecessary taps on additional card types the app won't consume. (`f15e98a`)
|
||||
273
monitoring/scripts/homelab-audit.sh
Executable file
273
monitoring/scripts/homelab-audit.sh
Executable file
@ -0,0 +1,273 @@
|
||||
#!/usr/bin/env bash
|
||||
# homelab-audit.sh — SSH-based homelab health audit
|
||||
#
|
||||
# Runs on the Proxmox host. Discovers running LXCs and VMs, SSHes into each
|
||||
# to collect system metrics, then generates a summary report.
|
||||
#
|
||||
# Usage:
|
||||
# homelab-audit.sh [--output-dir DIR]
|
||||
#
|
||||
# Environment overrides:
|
||||
# STUCK_PROC_CPU_WARN CPU% at which a D-state process is flagged (default: 10)
|
||||
# REPORT_DIR Output directory for per-host reports and logs
|
||||
# SSH_USER Remote user (default: root)
|
||||
|
||||
# -e omitted intentionally — unreachable hosts should not abort the full audit
|
||||
set -uo pipefail
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
STUCK_PROC_CPU_WARN="${STUCK_PROC_CPU_WARN:-10}"
|
||||
REPORT_DIR="${REPORT_DIR:-/tmp/homelab-audit-$(date +%Y%m%d-%H%M%S)}"
|
||||
SSH_USER="${SSH_USER:-root}"
|
||||
SSH_OPTS="-o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 -o BatchMode=yes"
|
||||
|
||||
DISK_WARN=80
|
||||
DISK_CRIT=90
|
||||
LOAD_WARN=2.0
|
||||
MEM_WARN=85
|
||||
ZOMBIE_WARN=1
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--output-dir)
|
||||
if [[ $# -lt 2 ]]; then
|
||||
echo "Error: --output-dir requires an argument" >&2
|
||||
exit 1
|
||||
fi
|
||||
REPORT_DIR="$2"
|
||||
shift 2
|
||||
;;
|
||||
*)
|
||||
echo "Unknown option: $1" >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
mkdir -p "$REPORT_DIR"
|
||||
SSH_FAILURES_LOG="$REPORT_DIR/ssh-failures.log"
|
||||
FINDINGS_FILE="$REPORT_DIR/findings.txt"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Remote collector script
|
||||
#
|
||||
# Kept single-quoted so no local variables are interpolated into the heredoc.
|
||||
# STUCK_PROC_CPU_WARN is passed as $1 when invoking the remote bash session,
|
||||
# so the configurable threshold reaches the collector without escaping issues.
|
||||
# ---------------------------------------------------------------------------
|
||||
COLLECTOR_SCRIPT='#!/usr/bin/env bash
|
||||
STUCK_PROC_CPU_WARN="${1:-10}"
|
||||
|
||||
cpu_load() {
|
||||
uptime | awk -F"load average:" "{print \$2}" | awk -F"[, ]+" "{print \$2}"
|
||||
}
|
||||
|
||||
mem_pct() {
|
||||
free | awk "/^Mem:/ {printf \"%.0f\", \$3/\$2*100}"
|
||||
}
|
||||
|
||||
disk_usage() {
|
||||
df --output=pcent,target -x tmpfs -x devtmpfs 2>/dev/null | tail -n +2 | \
|
||||
while read -r pct mnt; do echo "${pct%%%} $mnt"; done
|
||||
}
|
||||
|
|
||||
|
||||
zombie_count() {
|
||||
ps -eo stat= | grep -c "^Z" || true
|
||||
}
|
||||
|
||||
stuck_procs() {
|
||||
ps -eo stat=,pcpu=,comm= | \
|
||||
awk -v t="$STUCK_PROC_CPU_WARN" '\''$1 ~ /^D/ && $2+0 >= t+0 {print $3}'\'' | \
|
||||
paste -sd,
|
||||
}
|
||||
|
||||
echo "CPU_LOAD=$(cpu_load)"
|
||||
echo "MEM_PCT=$(mem_pct)"
|
||||
echo "ZOMBIES=$(zombie_count)"
|
||||
echo "STUCK_PROCS=$(stuck_procs)"
|
||||
disk_usage | while read -r pct mnt; do
|
||||
echo "DISK $pct $mnt"
|
||||
done
|
||||
'
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SSH helper — logs stderr to ssh-failures.log instead of silently discarding
|
||||
# ---------------------------------------------------------------------------
|
||||
ssh_cmd() {
|
||||
local host="$1"
|
||||
shift
|
||||
# shellcheck disable=SC2086
|
||||
ssh $SSH_OPTS "${SSH_USER}@${host}" "$@" 2>>"$SSH_FAILURES_LOG"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# LXC IP discovery
|
||||
#
|
||||
# lxc-info only returns IPs for containers using Proxmox-managed DHCP bridges.
|
||||
# Containers with static IPs defined inside the container (not via Proxmox
|
||||
# network config) return nothing. Fall back to parsing `pct config` in that
|
||||
# case to find the ip= field from the container's network interface config.
|
||||
# ---------------------------------------------------------------------------
|
||||
get_lxc_ip() {
|
||||
local ctid="$1"
|
||||
local ip
|
||||
ip=$(lxc-info -n "$ctid" -iH 2>/dev/null | head -1)
|
||||
if [[ -z "$ip" ]]; then
|
||||
ip=$(pct config "$ctid" 2>/dev/null | grep -oP '(?<=ip=)[^/,]+' | head -1)
|
||||
fi
|
||||
echo "$ip"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Inventory: running LXCs and VMs
|
||||
# Returns lines of "label ip"
|
||||
# ---------------------------------------------------------------------------
|
||||
collect_inventory() {
|
||||
# LXCs
|
||||
pct list 2>/dev/null | tail -n +2 | while read -r ctid status _name; do
|
||||
[[ "$status" != "running" ]] && continue
|
||||
local ip
|
||||
ip=$(get_lxc_ip "$ctid")
|
||||
[[ -n "$ip" ]] && echo "lxc-${ctid} $ip"
|
||||
done
|
||||
|
||||
# VMs — use agent network info if available, fall back to qm config
|
||||
qm list 2>/dev/null | tail -n +2 | while read -r vmid _name status _mem _bootdisk _pid; do
|
||||
[[ "$status" != "running" ]] && continue
|
||||
local ip
|
||||
ip=$(qm guest cmd "$vmid" network-get-interfaces 2>/dev/null |
|
||||
python3 -c "
|
||||
import sys, json
|
||||
try:
|
||||
data = json.load(sys.stdin)
|
||||
for iface in data:
|
||||
for addr in iface.get('ip-addresses', []):
|
||||
if addr['ip-address-type'] == 'ipv4' and not addr['ip-address'].startswith('127.'):
|
||||
print(addr['ip-address'])
|
||||
raise SystemExit
|
||||
except Exception:
|
||||
pass
|
||||
" 2>/dev/null)
|
||||
[[ -n "$ip" ]] && echo "vm-${vmid} $ip"
|
||||
done
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Collect metrics from one host and record findings
|
||||
# ---------------------------------------------------------------------------
|
||||
parse_and_report() {
|
||||
local label="$1"
|
||||
local addr="$2"
|
||||
local raw
|
||||
|
||||
if ! raw=$(echo "$COLLECTOR_SCRIPT" | ssh_cmd "$addr" bash -s -- "$STUCK_PROC_CPU_WARN"); then
|
||||
echo "SSH_FAILURE $label $addr" >>"$SSH_FAILURES_LOG"
|
||||
echo "WARN $label: SSH connection failed" >>"$FINDINGS_FILE"
|
||||
return
|
||||
fi
|
||||
|
||||
while IFS= read -r line; do
|
||||
case "$line" in
|
||||
CPU_LOAD=*)
|
||||
local load="${line#CPU_LOAD=}"
|
||||
if [[ -n "$load" ]] && awk "BEGIN{exit !($load > $LOAD_WARN)}"; then
|
||||
echo "WARN $label: load average ${load} > ${LOAD_WARN}" >>"$FINDINGS_FILE"
|
||||
fi
|
||||
;;
|
||||
MEM_PCT=*)
|
||||
local mem="${line#MEM_PCT=}"
|
||||
if [[ -n "$mem" ]] && ((mem >= MEM_WARN)); then
|
||||
echo "WARN $label: memory ${mem}% >= ${MEM_WARN}%" >>"$FINDINGS_FILE"
|
||||
fi
|
||||
;;
|
||||
ZOMBIES=*)
|
||||
local zombies="${line#ZOMBIES=}"
|
||||
if [[ -n "$zombies" ]] && ((zombies >= ZOMBIE_WARN)); then
|
||||
echo "WARN $label: ${zombies} zombie process(es)" >>"$FINDINGS_FILE"
|
||||
fi
|
||||
;;
|
||||
STUCK_PROCS=*)
|
||||
local procs="${line#STUCK_PROCS=}"
|
||||
if [[ -n "$procs" ]]; then
|
||||
echo "WARN $label: D-state procs with CPU>=${STUCK_PROC_CPU_WARN}%: ${procs}" >>"$FINDINGS_FILE"
|
||||
fi
|
||||
;;
|
||||
DISK\ *)
|
||||
local pct mnt
|
||||
read -r _ pct mnt <<<"$line"
|
||||
if ((pct >= DISK_CRIT)); then
|
||||
echo "CRIT $label: disk ${mnt} at ${pct}% >= ${DISK_CRIT}%" >>"$FINDINGS_FILE"
|
||||
elif ((pct >= DISK_WARN)); then
|
||||
echo "WARN $label: disk ${mnt} at ${pct}% >= ${DISK_WARN}%" >>"$FINDINGS_FILE"
|
||||
fi
|
||||
;;
|
||||
esac
|
||||
done <<<"$raw"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Summary — driven by actual findings in findings.txt and ssh-failures.log
|
||||
# ---------------------------------------------------------------------------
|
||||
generate_summary() {
|
||||
local host_count="$1"
|
||||
local ssh_failure_count=0
|
||||
local warn_count=0
|
||||
local crit_count=0
|
||||
|
||||
[[ -f "$SSH_FAILURES_LOG" ]] &&
|
||||
ssh_failure_count=$(grep -c '^SSH_FAILURE' "$SSH_FAILURES_LOG" 2>/dev/null || true)
|
||||
[[ -f "$FINDINGS_FILE" ]] &&
|
||||
warn_count=$(grep -c '^WARN' "$FINDINGS_FILE" 2>/dev/null || true)
|
||||
[[ -f "$FINDINGS_FILE" ]] &&
|
||||
crit_count=$(grep -c '^CRIT' "$FINDINGS_FILE" 2>/dev/null || true)
|
||||
|
||||
echo ""
|
||||
echo "=============================="
|
||||
echo " HOMELAB AUDIT SUMMARY"
|
||||
echo "=============================="
|
||||
printf " Hosts audited : %d\n" "$host_count"
|
||||
printf " SSH failures : %d\n" "$ssh_failure_count"
|
||||
printf " Warnings : %d\n" "$warn_count"
|
||||
printf " Critical : %d\n" "$crit_count"
|
||||
echo "=============================="
|
||||
|
||||
if ((warn_count + crit_count > 0)); then
|
||||
echo ""
|
||||
echo "Findings:"
|
||||
sort "$FINDINGS_FILE"
|
||||
fi
|
||||
|
||||
if ((ssh_failure_count > 0)); then
|
||||
echo ""
|
||||
echo "SSH failures (see $SSH_FAILURES_LOG for details):"
|
||||
grep '^SSH_FAILURE' "$SSH_FAILURES_LOG" | awk '{print " " $2 " (" $3 ")"}'
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Reports: $REPORT_DIR"
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Main
|
||||
# ---------------------------------------------------------------------------
|
||||
main() {
|
||||
echo "Starting homelab audit — $(date)"
|
||||
echo "Report dir: $REPORT_DIR"
|
||||
echo "STUCK_PROC_CPU_WARN threshold: ${STUCK_PROC_CPU_WARN}%"
|
||||
echo ""
|
||||
|
||||
>"$FINDINGS_FILE"
|
||||
|
||||
local host_count=0
|
||||
while read -r label addr; do
|
||||
echo " Auditing $label ($addr)..."
|
||||
parse_and_report "$label" "$addr"
|
||||
((host_count++)) || true
|
||||
done < <(collect_inventory)
|
||||
|
||||
generate_summary "$host_count"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
63
paper-dynasty/2026-03-30.md
Normal file
63
paper-dynasty/2026-03-30.md
Normal file
@ -0,0 +1,63 @@
|
||||
---
|
||||
title: "Refractor Phase 2: Integration — boost wiring, tests, and review"
|
||||
description: "Implemented apply_tier_boost orchestration, dry_run evaluator, evaluate-game wiring with kill switch, and 51 new tests across paper-dynasty-database. PRs #176 and #177 merged."
|
||||
type: context
|
||||
domain: paper-dynasty
|
||||
tags: [paper-dynasty-database, refractor, phase-2, testing]
|
||||
---
|
||||
|
||||
# Refractor Phase 2: Integration — boost wiring, tests, and review
|
||||
|
||||
**Date:** 2026-03-30
|
||||
**Branch:** `feature/refractor-phase2-integration` (merged to `main`)
|
||||
**Repo:** paper-dynasty-database
|
||||
|
||||
## What Was Done
|
||||
|
||||
Full implementation of Refractor Phase 2 Integration — wiring the Phase 2 Foundation boost functions (PR #176) into the live evaluate-game endpoint so that tier-ups actually create boosted variant cards with modified ratings.
|
||||
|
||||
1. **PR #176 merged (Foundation)** — Review findings fixed (renamed `evolution_tier` to `refractor_tier`, removed redundant parens), then merged via pd-ops
|
||||
2. **`evaluate_card(dry_run=True)`** — Added dry_run parameter to separate tier detection from tier write. `apply_tier_boost()` becomes the sole writer of `current_tier`, ensuring atomicity with variant creation. Added `computed_tier` and `computed_fully_evolved` to return dict.
|
||||
3. **`apply_tier_boost()` orchestration** — Full flow: source card lookup, boost application per vs_hand split, variant card + ratings creation with idempotency guards, audit record with idempotency guard, atomic state mutations via `db.atomic()`. Display stat helpers compute fresh avg/obp/slg.
|
||||
4. **`evaluate_game()` wiring** — Calls evaluate_card with dry_run=True, loops through intermediate tiers on tier-up, handles partial multi-tier failures (reports last successful tier), `REFRACTOR_BOOST_ENABLED` env var kill switch, suppresses false notifications when boost is disabled or card_type is missing.
|
||||
5. **79-sum documentation fix** — Clarified all references to "79-sum" across code, tests, and docs to note the 108-total card invariant (79 variable + 29 x-check for pitchers).
|
||||
6. **51 new tests** — Display stat unit tests (12), integration tests for orchestration (27), HTTP endpoint tests (7), dry_run evaluator tests (6). Total suite: 223 passed.
|
||||
7. **Five rounds of swarm reviews** — Each change reviewed individually by swarm-reviewer agents. All findings addressed: false notification on null card_type, wrong tier in log message, partial multi-tier failure reporting, atomicity test accuracy, audit idempotency gap, import os placement.
|
||||
8. **PR #177 merged** — Review found two issues (import os inside function, audit idempotency gap on PostgreSQL UNIQUE constraint). Both fixed, pushed, approved by Claude, merged via pd-ops.
|
||||
|
||||
## Decisions
|
||||
|
||||
### Display stats computed fresh, not set to None
|
||||
The original PO review note suggested setting avg/obp/slg to None on variant cards and deferring recalculation. Cal decided to compute them fresh using the exact Pydantic validator formulas instead — strictly better than stale or missing values. Design doc updated to reflect this.
|
||||
|
||||
### Card/ratings creation outside db.atomic()
|
||||
The design doc specified all writes inside `db.atomic()`. Implementation splits card/ratings creation outside (idempotent, retry-safe via get_or_none guards) with only state mutations (audit, tier write, Card.variant propagation) inside the atomic block. This is pragmatically correct — on retry, existing card/ratings are reused. Design doc updated.
|
||||
|
||||
### Kill switch suppresses notifications entirely
|
||||
When `REFRACTOR_BOOST_ENABLED=false`, the router skips both the boost AND the tier_up notification (via `continue`). This prevents false notifications to the Discord bot during maintenance windows. Initially the code fell through and emitted a notification without a variant — caught during coverage gap analysis and fixed.
|
||||
|
||||
### Audit idempotency guard added
|
||||
PR review identified that `RefractorBoostAudit` has a `UNIQUE(card_state_id, tier)` constraint in PostgreSQL (from the migration) that the SQLite test DB doesn't enforce. Added `get_or_none` before `create` to prevent IntegrityError on retry.
|
||||
|
||||
## Follow-Up
|
||||
|
||||
- Phase 3: Documentation updates in `card-creation` repo (docs only, no code)
|
||||
- Phase 4a: Validation test cases in `database` repo
|
||||
- Phase 4b: Discord bot tier-up notification fix (must ship alongside or after Phase 2 deploy)
|
||||
- Deploy Phase 2 to dev: run migration `2026-03-28_refractor_phase2_boost.sql` on dev DB
|
||||
- Stale branches to clean up in database repo: `feat/evolution-refractor-schema-migration`, `test/refractor-tier3`
|
||||
|
||||
## Files Changed
|
||||
|
||||
**paper-dynasty-database:**
|
||||
- `app/services/refractor_boost.py` — apply_tier_boost orchestration, display stat helpers, card_type validation, audit idempotency guard
|
||||
- `app/services/refractor_evaluator.py` — dry_run parameter, computed_tier/computed_fully_evolved in return dict
|
||||
- `app/routers_v2/refractor.py` — evaluate_game wiring, kill switch, partial multi-tier failure, isoformat crash fix
|
||||
- `tests/test_refractor_boost.py` — 12 new display stat tests, 79-sum comment fixes
|
||||
- `tests/test_refractor_boost_integration.py` — 27 new integration tests (new file)
|
||||
- `tests/test_postgame_refractor.py` — 7 new HTTP endpoint tests
|
||||
- `tests/test_refractor_evaluator.py` — 6 new dry_run unit tests
|
||||
|
||||
**paper-dynasty (parent repo):**
|
||||
- `docs/refractor-phase2/01-phase1-foundation.md` — 79-sum clarifications
|
||||
- `docs/refractor-phase2/02-phase2-integration.md` — atomicity boundary, display stats updates
|
||||
48
paper-dynasty/open-packs-checkin-crash.md
Normal file
48
paper-dynasty/open-packs-checkin-crash.md
Normal file
@ -0,0 +1,48 @@
|
||||
---
|
||||
title: "Fix: /open-packs crash from orphaned Check-In Player packs"
|
||||
description: "Check-In Player packs with hyphenated name caused empty Discord select menu (400 Bad Request) and KeyError in callback."
|
||||
type: troubleshooting
|
||||
domain: paper-dynasty
|
||||
tags: [troubleshooting, discord, paper-dynasty, packs, hotfix]
|
||||
---
|
||||
|
||||
# Fix: /open-packs crash from orphaned Check-In Player packs
|
||||
|
||||
**Date:** 2026-03-26
|
||||
**PR:** #134 (hotfix branch based on prod tag 2026.3.4, merged to main)
|
||||
**Tag:** 2026.3.8
|
||||
**Severity:** High --- any user with an orphaned Check-In Player pack could not open any packs at all
|
||||
|
||||
## Problem
|
||||
|
||||
Running `/open-packs` returned: `HTTPException: 400 Bad Request (error code: 50035): Invalid Form Body --- In data.components.0.components.0.options: This field is required`
|
||||
|
||||
Discord rejected the message because the select menu had zero options.
|
||||
|
||||
## Root Cause
|
||||
|
||||
Two cascading bugs triggered by the "Check-In Player" pack type name containing a hyphen:
|
||||
|
||||
1. **Empty select menu:** The `pretty_name` logic used `'-' not in key` to identify bare pack type names. "Check-In Player" contains a hyphen, so it fell into the `elif 'Team' in key` / `elif 'Cardset' in key` chain --- matching neither. `pretty_name` stayed `None`, no `SelectOption` was created, and Discord rejected the empty options list.
|
||||
|
||||
2. **KeyError in callback (secondary):** Even if displayed, selecting "Check-In Player" would call `self.values[0].split('-')` producing `['Check', 'In Player']`, which matched none of the pack type tokens in the `if/elif` chain, raising `KeyError`.
|
||||
|
||||
Check-In Player packs are normally auto-opened during the daily check-in (`/comeonmanineedthis`). An orphaned pack existed because `roll_for_cards` had previously failed mid-flow, leaving an unopened pack in inventory.
|
||||
|
||||
## Fix
|
||||
|
||||
Three-layer fix applied to both `cogs/economy.py` (production) and `cogs/economy_new/packs.py` (main):
|
||||
|
||||
1. **Filter at source:** Added `AUTO_OPEN_TYPES = {"Check-In Player"}` set. Packs with these types are skipped during grouping with `continue`, so they never reach the select menu.
|
||||
|
||||
2. **Fallback for hyphenated names:** Added `else: pretty_name = key` after the `Team`/`Cardset` checks, so any future hyphenated pack type names still get a display label.
|
||||
|
||||
3. **Graceful error in callback:** Replaced `raise KeyError` with a user-facing ephemeral message ("This pack type cannot be opened manually. Please contact Cal.") and `return`.
|
||||
|
||||
Also changed all "contact an admin" strings to "contact Cal" in `discord_ui/selectors.py`.
|
||||
|
||||
## Lessons
|
||||
|
||||
- **Production loads `cogs/economy.py`, not `cogs/economy_new/packs.py`.** The initial fix was applied to the wrong file. Always check which cogs are actually loaded by inspecting the bot startup logs (`Loaded cog: ...`) before assuming which file handles a command.
|
||||
- **Hotfix branches based on old tags may have stale CI workflows.** The `docker-build.yml` at the tagged commit had an older trigger config (branch push, not tag push), so the CalVer tag silently failed to trigger CI. Cherry-pick the current workflow into hotfix branches.
|
||||
- **Pack type names are used as dict keys and split on hyphens** throughout the open-packs flow. Any new pack type with a hyphen in its name will hit similar issues unless the grouping/parsing logic is refactored to stop using hyphen-delimited strings as composite keys.
|
||||
62
productivity/codex-agents-marketplace.md
Normal file
62
productivity/codex-agents-marketplace.md
Normal file
@ -0,0 +1,62 @@
|
||||
---
|
||||
title: "Codex-to-Claude Agent Converter & Plugin Marketplace"
|
||||
description: "Pipeline that converts VoltAgent/awesome-codex-subagents TOML definitions to Claude Code plugin marketplace format, hosted at cal/codex-agents on Gitea."
|
||||
type: reference
|
||||
domain: productivity
|
||||
tags: [claude-code, automation, plugins, agents, gitea]
|
||||
---
|
||||
|
||||
# Codex Agents Marketplace
|
||||
|
||||
## Overview
|
||||
|
||||
136+ specialized agent definitions converted from [VoltAgent/awesome-codex-subagents](https://github.com/VoltAgent/awesome-codex-subagents) (OpenAI Codex format) to Claude Code plugin marketplace format.
|
||||
|
||||
- **Repo**: `cal/codex-agents` on Gitea (`git@git.manticorum.com:cal/codex-agents.git`)
|
||||
- **Local path**: `/mnt/NV2/Development/codex-agents/`
|
||||
- **Upstream**: Cloned to `upstream/` (gitignored), pulled on each sync
|
||||
|
||||
## Sync Pipeline
|
||||
|
||||
```bash
|
||||
cd /mnt/NV2/Development/codex-agents
|
||||
./sync.sh # pull upstream + convert changed agents
|
||||
./sync.sh --force # re-convert all regardless of hash
|
||||
./sync.sh --dry-run # preview only
|
||||
./sync.sh --verbose # per-agent status
|
||||
```
|
||||
|
||||
- `convert.py` handles TOML → Markdown+YAML frontmatter conversion
|
||||
- SHA-256 per-file hashes in `codex-manifest.json` skip unchanged agents
|
||||
- Deleted upstream agents are auto-removed locally
|
||||
- `.claude-plugin/marketplace.json` is regenerated on each sync
|
||||
|
||||
## Format Mapping
|
||||
|
||||
| Codex | Claude Code |
|
||||
|-------|------------|
|
||||
| `gpt-5.4` + `high` | `model: opus` |
|
||||
| `gpt-5.3-codex-spark` + `medium` | `model: sonnet` |
|
||||
| `sandbox_mode: read-only` | `disallowedTools: Edit, Write` |
|
||||
| `sandbox_mode: workspace-write` | full tool access |
|
||||
| `developer_instructions` | markdown body |
|
||||
| `"parent agent"` | replaced with `"orchestrating agent"` |
|
||||
|
||||
## Installing Agents
|
||||
|
||||
Add marketplace to `~/.claude/settings.json`:
|
||||
```json
|
||||
"extraKnownMarketplaces": {
|
||||
"codex-agents": { "source": { "source": "git", "url": "https://git.manticorum.com/cal/codex-agents.git" } }
|
||||
}
|
||||
```
|
||||
|
||||
Then:
|
||||
```bash
|
||||
claude plugin update codex-agents
|
||||
claude plugin install docker-expert@codex-agents --scope user
|
||||
```
|
||||
|
||||
## Agent Categories
|
||||
|
||||
10 categories: Core Development (12), Language Specialists (27), Infrastructure (16), Quality & Security (16), Data & AI (12), Developer Experience (13), Specialized Domains (12), Business & Product (11), Meta & Orchestration (10), Research & Analysis (7).
|
||||
Loading…
Reference in New Issue
Block a user
The awk program is double-quoted on the remote side. When
bash -s -- "$STUCK_PROC_CPU_WARN"runs on the remote host, the remote shell will expand$1,$2,$3as its own positional parameters before passing the string to awk. Since only$1has a value (the CPU threshold),$2and$3expand to empty string — the awk stat/pcpu/comm field references vanish and the filter never matches anything.Fix: single-quote the awk program on the remote side. Since
COLLECTOR_SCRIPTis already outer-single-quoted, use'\''to embed a literal single quote: