Merge branch 'main' into issue/19-right-size-vm-106-docker-home-16-gb-6-8-gb-ram
All checks were successful
Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 3s

This commit is contained in:
cal 2026-04-06 15:40:07 +00:00
commit 5b23d92435
10 changed files with 863 additions and 7 deletions

View File

@ -0,0 +1,80 @@
---
# gitea-cleanup.yml — Weekly cleanup of Gitea server disk space
#
# Removes stale Docker buildx volumes, unused images, Gitea repo-archive
# cache, and vacuums journal logs to prevent disk exhaustion on LXC 225.
#
# Schedule: Weekly via systemd timer on LXC 304 (ansible-controller)
#
# Usage:
# ansible-playbook /opt/ansible/playbooks/gitea-cleanup.yml # full run
# ansible-playbook /opt/ansible/playbooks/gitea-cleanup.yml --check # dry run
- name: Gitea server disk cleanup
hosts: gitea
gather_facts: false
tasks:
- name: Check current disk usage
ansible.builtin.shell: df --output=pcent / | tail -1
register: disk_before
changed_when: false
- name: Display current disk usage
ansible.builtin.debug:
msg: "Disk usage before cleanup: {{ disk_before.stdout | trim }}"
- name: Clear Gitea repo-archive cache
ansible.builtin.find:
paths: /var/lib/gitea/data/repo-archive
file_type: any
register: repo_archive_files
- name: Remove repo-archive files
ansible.builtin.file:
path: "{{ item.path }}"
state: absent
loop: "{{ repo_archive_files.files }}"
loop_control:
label: "{{ item.path | basename }}"
when: repo_archive_files.files | length > 0
- name: Remove orphaned Docker buildx volumes
ansible.builtin.shell: |
volumes=$(docker volume ls -q --filter name=buildx_buildkit)
if [ -n "$volumes" ]; then
echo "$volumes" | xargs docker volume rm 2>&1
else
echo "No buildx volumes to remove"
fi
register: buildx_cleanup
changed_when: "'No buildx volumes' not in buildx_cleanup.stdout"
- name: Prune unused Docker images
ansible.builtin.command: docker image prune -af
register: image_prune
changed_when: "'Total reclaimed space: 0B' not in image_prune.stdout"
- name: Prune unused Docker volumes
ansible.builtin.command: docker volume prune -f
register: volume_prune
changed_when: "'Total reclaimed space: 0B' not in volume_prune.stdout"
- name: Vacuum journal logs to 500M
ansible.builtin.command: journalctl --vacuum-size=500M
register: journal_vacuum
changed_when: "'freed 0B' not in journal_vacuum.stderr"
- name: Check disk usage after cleanup
ansible.builtin.shell: df --output=pcent / | tail -1
register: disk_after
changed_when: false
- name: Display cleanup summary
ansible.builtin.debug:
msg: >-
Cleanup complete.
Disk: {{ disk_before.stdout | default('N/A') | trim }} → {{ disk_after.stdout | default('N/A') | trim }}.
Buildx: {{ (buildx_cleanup.stdout_lines | default(['N/A'])) | last }}.
Images: {{ (image_prune.stdout_lines | default(['N/A'])) | last }}.
Journal: {{ (journal_vacuum.stderr_lines | default(['N/A'])) | last }}.

View File

@ -1,9 +1,9 @@
---
title: "Monitoring Scripts Context"
description: "Operational context for all monitoring scripts: Jellyfin GPU health monitor, NVIDIA driver update checker, Tdarr API/file monitors, and Windows reboot detection. Includes cron schedules, Discord integration patterns, and troubleshooting."
description: "Operational context for all monitoring scripts: Proxmox backup checker, CT 302 self-health, Jellyfin GPU health monitor, NVIDIA driver update checker, Tdarr API/file monitors, and Windows reboot detection. Includes cron schedules, Discord integration patterns, and troubleshooting."
type: context
domain: monitoring
tags: [jellyfin, gpu, nvidia, tdarr, discord, cron, python, windows, scripts]
tags: [proxmox, backup, jellyfin, gpu, nvidia, tdarr, discord, cron, python, bash, windows, scripts]
---
# Monitoring Scripts - Operational Context
@ -13,6 +13,77 @@ This directory contains active operational scripts for system monitoring, health
## Core Monitoring Scripts
### Proxmox Backup Verification
**Script**: `proxmox-backup-check.sh`
**Purpose**: Weekly check that every running VM/CT has a successful vzdump backup within 7 days. Posts a color-coded Discord embed with per-guest status.
**Key Features**:
- SSHes to Proxmox host and queries `pvesh` task history + guest lists via API
- Categorizes each guest: 🟢 green (backed up), 🟡 yellow (overdue), 🔴 red (no backup)
- Sorts output by VMID; only posts to Discord — no local side effects
- `--dry-run` mode prints the Discord payload without sending
- `--days N` overrides the default 7-day window
**Schedule**: Weekly on Monday 08:00 UTC (CT 302 cron)
```bash
0 8 * * 1 DISCORD_WEBHOOK="<url>" /root/scripts/proxmox-backup-check.sh >> /var/log/proxmox-backup-check.log 2>&1
```
**Usage**:
```bash
# Dry run (no Discord)
proxmox-backup-check.sh --dry-run
# Post to Discord
DISCORD_WEBHOOK="https://discord.com/api/webhooks/..." proxmox-backup-check.sh
# Custom window
proxmox-backup-check.sh --days 14 --discord-webhook "https://..."
```
**Dependencies**: `jq`, `curl`, SSH access to Proxmox host alias `proxmox`
**Install on CT 302**:
```bash
cp proxmox-backup-check.sh /root/scripts/
chmod +x /root/scripts/proxmox-backup-check.sh
```
### CT 302 Self-Health Monitor
**Script**: `ct302-self-health.sh`
**Purpose**: Monitors disk usage on CT 302 (claude-runner) itself. Alerts to Discord when any filesystem exceeds the threshold (default 80%). Runs silently when healthy — no Discord spam on green.
**Key Features**:
- Checks all non-virtual filesystems (`df`, excludes tmpfs/devtmpfs/overlay)
- Only sends a Discord alert when a filesystem is at or above threshold
- `--always-post` flag forces a post even when healthy (useful for testing)
- `--dry-run` mode prints payload without sending
**Schedule**: Daily at 07:00 UTC (CT 302 cron)
```bash
0 7 * * * DISCORD_WEBHOOK="<url>" /root/scripts/ct302-self-health.sh >> /var/log/ct302-self-health.log 2>&1
```
**Usage**:
```bash
# Check and alert if over 80%
DISCORD_WEBHOOK="https://discord.com/api/webhooks/..." ct302-self-health.sh
# Lower threshold test
ct302-self-health.sh --threshold 50 --dry-run
# Always post (weekly status report pattern)
ct302-self-health.sh --always-post --discord-webhook "https://..."
```
**Dependencies**: `jq`, `curl`, `df`
**Install on CT 302**:
```bash
cp ct302-self-health.sh /root/scripts/
chmod +x /root/scripts/ct302-self-health.sh
```
### Jellyfin GPU Health Monitor
**Script**: `jellyfin_gpu_monitor.py`
**Purpose**: Monitor Jellyfin container GPU access with Discord alerts and auto-restart capability
@ -235,6 +306,17 @@ python3 tdarr_file_monitor.py >> /mnt/NV2/Development/claude-home/logs/tdarr-fil
0 9 * * 1 /usr/bin/python3 /home/cal/scripts/nvidia_update_checker.py --check --discord-alerts >> /home/cal/logs/nvidia-update-checker.log 2>&1
```
**Active Cron Jobs** (on CT 302 / claude-runner, root user):
```bash
# Proxmox backup verification - Weekly (Mondays at 8 AM UTC)
0 8 * * 1 DISCORD_WEBHOOK="<homelab-alerts-webhook>" /root/scripts/proxmox-backup-check.sh >> /var/log/proxmox-backup-check.log 2>&1
# CT 302 self-health disk check - Daily at 7 AM UTC (alerts only when >80%)
0 7 * * * DISCORD_WEBHOOK="<homelab-alerts-webhook>" /root/scripts/ct302-self-health.sh >> /var/log/ct302-self-health.log 2>&1
```
**Note**: Scripts must be installed manually on CT 302. Source of truth is `monitoring/scripts/` in this repo — copy to `/root/scripts/` on CT 302 to deploy.
**Manual/On-Demand**:
- `tdarr_monitor.py` - Run as needed for Tdarr health checks
- `tdarr_file_monitor.py` - Can be scheduled if automatic backup needed

View File

@ -0,0 +1,158 @@
#!/usr/bin/env bash
# ct302-self-health.sh — CT 302 (claude-runner) disk self-check → Discord
#
# Monitors disk usage on CT 302 itself and alerts to Discord when any
# filesystem exceeds the threshold. Closes the blind spot where the
# monitoring system cannot monitor itself via external health checks.
#
# Designed to run silently when healthy (no Discord spam on green).
# Only posts when a filesystem is at or above THRESHOLD.
#
# Usage:
# ct302-self-health.sh [--discord-webhook URL] [--threshold N] [--dry-run] [--always-post]
#
# Environment overrides:
# DISCORD_WEBHOOK Discord webhook URL (required unless --dry-run)
# DISK_THRESHOLD Disk usage % alert threshold (default: 80)
#
# Install on CT 302 (daily, 07:00 UTC):
# 0 7 * * * /root/scripts/ct302-self-health.sh >> /var/log/ct302-self-health.log 2>&1
set -uo pipefail
DISK_THRESHOLD="${DISK_THRESHOLD:-80}"
DISCORD_WEBHOOK="${DISCORD_WEBHOOK:-}"
DRY_RUN=0
ALWAYS_POST=0
while [[ $# -gt 0 ]]; do
case "$1" in
--discord-webhook)
if [[ $# -lt 2 ]]; then
echo "Error: --discord-webhook requires a value" >&2
exit 1
fi
DISCORD_WEBHOOK="$2"
shift 2
;;
--threshold)
if [[ $# -lt 2 ]]; then
echo "Error: --threshold requires a value" >&2
exit 1
fi
DISK_THRESHOLD="$2"
shift 2
;;
--dry-run)
DRY_RUN=1
shift
;;
--always-post)
ALWAYS_POST=1
shift
;;
*)
echo "Unknown option: $1" >&2
exit 1
;;
esac
done
if [[ "$DRY_RUN" -eq 0 && -z "$DISCORD_WEBHOOK" ]]; then
echo "Error: DISCORD_WEBHOOK not set. Use --discord-webhook URL or set env var." >&2
exit 1
fi
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
# ---------------------------------------------------------------------------
# Check disk usage on all real filesystems
# ---------------------------------------------------------------------------
# df output: Filesystem Use% Mounted-on (skipping tmpfs, devtmpfs, overlay)
TRIGGERED=()
ALL_FS=()
while IFS= read -r line; do
fs=$(echo "$line" | awk '{print $1}')
pct=$(echo "$line" | awk '{print $5}' | tr -d '%')
mount=$(echo "$line" | awk '{print $6}')
ALL_FS+=("${pct}% ${mount} (${fs})")
if [[ "$pct" -ge "$DISK_THRESHOLD" ]]; then
TRIGGERED+=("${pct}% used — ${mount} (${fs})")
fi
done < <(df -h --output=source,size,used,avail,pcent,target |
tail -n +2 |
awk '$1 !~ /^(tmpfs|devtmpfs|overlay|udev)/' |
awk '{print $1, $5, $6}')
HOSTNAME=$(hostname -s)
TRIGGERED_COUNT=${#TRIGGERED[@]}
log "Disk check complete: ${TRIGGERED_COUNT} filesystem(s) above ${DISK_THRESHOLD}%"
# Exit cleanly with no Discord post if everything is healthy
if [[ "$TRIGGERED_COUNT" -eq 0 && "$ALWAYS_POST" -eq 0 && "$DRY_RUN" -eq 0 ]]; then
log "All filesystems healthy — no alert needed."
exit 0
fi
# ---------------------------------------------------------------------------
# Build Discord payload
# ---------------------------------------------------------------------------
if [[ "$TRIGGERED_COUNT" -gt 0 ]]; then
EMBED_COLOR=15548997 # 0xED4245 red
TITLE="🔴 ${HOSTNAME}: Disk usage above ${DISK_THRESHOLD}%"
alert_lines=$(printf '⚠️ %s\n' "${TRIGGERED[@]}")
FIELDS=$(jq -n \
--arg name "Filesystems Over Threshold" \
--arg value "$alert_lines" \
'[{"name": $name, "value": $value, "inline": false}]')
else
EMBED_COLOR=5763719 # 0x57F287 green
TITLE="🟢 ${HOSTNAME}: All filesystems healthy"
FIELDS='[]'
fi
# Add summary of all filesystems
all_lines=$(printf '%s\n' "${ALL_FS[@]}")
FIELDS=$(echo "$FIELDS" | jq \
--arg name "All Filesystems" \
--arg value "$all_lines" \
'. + [{"name": $name, "value": $value, "inline": false}]')
FOOTER="$(date -u '+%Y-%m-%d %H:%M UTC') · CT 302 self-health · threshold: ${DISK_THRESHOLD}%"
PAYLOAD=$(jq -n \
--arg title "$TITLE" \
--argjson color "$EMBED_COLOR" \
--argjson fields "$FIELDS" \
--arg footer "$FOOTER" \
'{
"embeds": [{
"title": $title,
"color": $color,
"fields": $fields,
"footer": {"text": $footer}
}]
}')
if [[ "$DRY_RUN" -eq 1 ]]; then
log "DRY RUN — Discord payload:"
echo "$PAYLOAD" | jq .
exit 0
fi
log "Posting to Discord..."
HTTP_STATUS=$(curl -s -o /tmp/ct302-self-health-discord.out \
-w "%{http_code}" \
-X POST "$DISCORD_WEBHOOK" \
-H "Content-Type: application/json" \
-d "$PAYLOAD")
if [[ "$HTTP_STATUS" -ge 200 && "$HTTP_STATUS" -lt 300 ]]; then
log "Discord notification sent (HTTP ${HTTP_STATUS})."
else
log "Warning: Discord returned HTTP ${HTTP_STATUS}."
cat /tmp/ct302-self-health-discord.out >&2
exit 1
fi

View File

@ -5,7 +5,7 @@
# to collect system metrics, then generates a summary report.
#
# Usage:
# homelab-audit.sh [--output-dir DIR]
# homelab-audit.sh [--output-dir DIR] [--hosts label:ip,label:ip,...]
#
# Environment overrides:
# STUCK_PROC_CPU_WARN CPU% at which a D-state process is flagged (default: 10)
@ -29,7 +29,6 @@ LOAD_WARN=2.0
MEM_WARN=85
ZOMBIE_WARN=1
SWAP_WARN=512
HOSTS_FILTER="" # comma-separated host list from --hosts; empty = audit all
JSON_OUTPUT=0 # set to 1 by --json

View File

@ -0,0 +1,230 @@
#!/usr/bin/env bash
# proxmox-backup-check.sh — Weekly Proxmox backup verification → Discord
#
# SSHes to the Proxmox host and checks that every running VM/CT has a
# successful vzdump backup within the last 7 days. Posts a color-coded
# Discord summary with per-guest status.
#
# Usage:
# proxmox-backup-check.sh [--discord-webhook URL] [--days N] [--dry-run]
#
# Environment overrides:
# DISCORD_WEBHOOK Discord webhook URL (required unless --dry-run)
# PROXMOX_NODE Proxmox node name (default: proxmox)
# PROXMOX_SSH SSH alias or host for Proxmox (default: proxmox)
# WINDOW_DAYS Backup recency window in days (default: 7)
#
# Install on CT 302 (weekly, Monday 08:00 UTC):
# 0 8 * * 1 /root/scripts/proxmox-backup-check.sh >> /var/log/proxmox-backup-check.log 2>&1
set -uo pipefail
PROXMOX_NODE="${PROXMOX_NODE:-proxmox}"
PROXMOX_SSH="${PROXMOX_SSH:-proxmox}"
WINDOW_DAYS="${WINDOW_DAYS:-7}"
DISCORD_WEBHOOK="${DISCORD_WEBHOOK:-}"
DRY_RUN=0
while [[ $# -gt 0 ]]; do
case "$1" in
--discord-webhook)
if [[ $# -lt 2 ]]; then
echo "Error: --discord-webhook requires a value" >&2
exit 1
fi
DISCORD_WEBHOOK="$2"
shift 2
;;
--days)
if [[ $# -lt 2 ]]; then
echo "Error: --days requires a value" >&2
exit 1
fi
WINDOW_DAYS="$2"
shift 2
;;
--dry-run)
DRY_RUN=1
shift
;;
*)
echo "Unknown option: $1" >&2
exit 1
;;
esac
done
if [[ "$DRY_RUN" -eq 0 && -z "$DISCORD_WEBHOOK" ]]; then
echo "Error: DISCORD_WEBHOOK not set. Use --discord-webhook URL or set env var." >&2
exit 1
fi
if ! command -v jq &>/dev/null; then
echo "Error: jq is required but not installed." >&2
exit 1
fi
SSH_OPTS="-o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 -o BatchMode=yes"
CUTOFF=$(date -d "-${WINDOW_DAYS} days" +%s)
NOW=$(date +%s)
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
# ---------------------------------------------------------------------------
# Fetch data from Proxmox
# ---------------------------------------------------------------------------
log "Fetching VM and CT list from Proxmox node '${PROXMOX_NODE}'..."
VMS_JSON=$(ssh $SSH_OPTS "$PROXMOX_SSH" \
"pvesh get /nodes/${PROXMOX_NODE}/qemu --output-format json 2>/dev/null" || echo "[]")
CTS_JSON=$(ssh $SSH_OPTS "$PROXMOX_SSH" \
"pvesh get /nodes/${PROXMOX_NODE}/lxc --output-format json 2>/dev/null" || echo "[]")
log "Fetching recent vzdump task history (limit 200)..."
TASKS_JSON=$(ssh $SSH_OPTS "$PROXMOX_SSH" \
"pvesh get /nodes/${PROXMOX_NODE}/tasks --typefilter vzdump --limit 200 --output-format json 2>/dev/null" || echo "[]")
# ---------------------------------------------------------------------------
# Build per-guest backup status
# ---------------------------------------------------------------------------
# Merge VMs and CTs into one list: [{vmid, name, type}]
GUESTS_JSON=$(jq -n \
--argjson vms "$VMS_JSON" \
--argjson cts "$CTS_JSON" '
($vms | map(select(.status == "running") | {vmid: (.vmid | tostring), name, type: "VM"})) +
($cts | map(select(.status == "running") | {vmid: (.vmid | tostring), name, type: "CT"}))
')
GUEST_COUNT=$(echo "$GUESTS_JSON" | jq 'length')
log "Found ${GUEST_COUNT} running guests."
# For each guest, find the most recent successful (status == "OK") vzdump task
RESULTS=$(jq -n \
--argjson guests "$GUESTS_JSON" \
--argjson tasks "$TASKS_JSON" \
--argjson cutoff "$CUTOFF" \
--argjson now "$NOW" \
--argjson window "$WINDOW_DAYS" '
$guests | map(
. as $g |
($tasks | map(
select(
(.vmid | tostring) == $g.vmid
and .status == "OK"
) | .starttime
) | max // 0) as $last_ts |
{
vmid: $g.vmid,
name: $g.name,
type: $g.type,
last_backup_ts: $last_ts,
age_days: (if $last_ts > 0 then (($now - $last_ts) / 86400 | floor) else -1 end),
status: (
if $last_ts >= $cutoff then "green"
elif $last_ts > 0 then "yellow"
else "red"
end
)
}
) | sort_by(.vmid | tonumber)
')
GREEN_GUESTS=$(echo "$RESULTS" | jq '[.[] | select(.status == "green")]')
YELLOW_GUESTS=$(echo "$RESULTS" | jq '[.[] | select(.status == "yellow")]')
RED_GUESTS=$(echo "$RESULTS" | jq '[.[] | select(.status == "red")]')
GREEN_COUNT=$(echo "$GREEN_GUESTS" | jq 'length')
YELLOW_COUNT=$(echo "$YELLOW_GUESTS" | jq 'length')
RED_COUNT=$(echo "$RED_GUESTS" | jq 'length')
log "Results: ${GREEN_COUNT} green, ${YELLOW_COUNT} yellow, ${RED_COUNT} red"
# ---------------------------------------------------------------------------
# Build Discord payload
# ---------------------------------------------------------------------------
if [[ "$RED_COUNT" -gt 0 ]]; then
EMBED_COLOR=15548997 # 0xED4245 red
STATUS_LINE="🔴 Backup issues detected — action required"
elif [[ "$YELLOW_COUNT" -gt 0 ]]; then
EMBED_COLOR=16705372 # 0xFF851C orange
STATUS_LINE="🟡 Some backups are overdue (>${WINDOW_DAYS}d)"
else
EMBED_COLOR=5763719 # 0x57F287 green
STATUS_LINE="🟢 All ${GUEST_COUNT} guests backed up within ${WINDOW_DAYS} days"
fi
# Format guest lines: "VM 116 (plex) — 2d ago" or "CT 302 (claude-runner) — NO BACKUPS"
format_guest() {
local prefix="$1" guests="$2"
echo "$guests" | jq -r '.[] | "\(.type) \(.vmid) (\(.name))"' |
while IFS= read -r line; do echo "${prefix} ${line}"; done
}
format_guest_with_age() {
local prefix="$1" guests="$2"
echo "$guests" | jq -r '.[] | "\(.type) \(.vmid) (\(.name)) — \(.age_days)d ago"' |
while IFS= read -r line; do echo "${prefix} ${line}"; done
}
# Build fields array
fields='[]'
if [[ "$GREEN_COUNT" -gt 0 ]]; then
green_lines=$(format_guest_with_age "✅" "$GREEN_GUESTS")
fields=$(echo "$fields" | jq \
--arg name "🟢 Healthy (${GREEN_COUNT})" \
--arg value "$green_lines" \
'. + [{"name": $name, "value": $value, "inline": false}]')
fi
if [[ "$YELLOW_COUNT" -gt 0 ]]; then
yellow_lines=$(format_guest_with_age "⚠️" "$YELLOW_GUESTS")
fields=$(echo "$fields" | jq \
--arg name "🟡 Overdue — last backup >${WINDOW_DAYS}d ago (${YELLOW_COUNT})" \
--arg value "$yellow_lines" \
'. + [{"name": $name, "value": $value, "inline": false}]')
fi
if [[ "$RED_COUNT" -gt 0 ]]; then
red_lines=$(format_guest "❌" "$RED_GUESTS")
fields=$(echo "$fields" | jq \
--arg name "🔴 No Successful Backups Found (${RED_COUNT})" \
--arg value "$red_lines" \
'. + [{"name": $name, "value": $value, "inline": false}]')
fi
FOOTER="$(date -u '+%Y-%m-%d %H:%M UTC') · ${GUEST_COUNT} guests · window: ${WINDOW_DAYS}d"
PAYLOAD=$(jq -n \
--arg title "Proxmox Backup Check — ${STATUS_LINE}" \
--argjson color "$EMBED_COLOR" \
--argjson fields "$fields" \
--arg footer "$FOOTER" \
'{
"embeds": [{
"title": $title,
"color": $color,
"fields": $fields,
"footer": {"text": $footer}
}]
}')
if [[ "$DRY_RUN" -eq 1 ]]; then
log "DRY RUN — Discord payload:"
echo "$PAYLOAD" | jq .
exit 0
fi
log "Posting to Discord..."
HTTP_STATUS=$(curl -s -o /tmp/proxmox-backup-check-discord.out \
-w "%{http_code}" \
-X POST "$DISCORD_WEBHOOK" \
-H "Content-Type: application/json" \
-d "$PAYLOAD")
if [[ "$HTTP_STATUS" -ge 200 && "$HTTP_STATUS" -lt 300 ]]; then
log "Discord notification sent (HTTP ${HTTP_STATUS})."
else
log "Warning: Discord returned HTTP ${HTTP_STATUS}."
cat /tmp/proxmox-backup-check-discord.out >&2
exit 1
fi

View File

@ -93,6 +93,34 @@ else
fail "disk_usage" "expected 'N /path', got: '$result'"
fi
# --- --hosts flag parsing ---
echo ""
echo "=== --hosts argument parsing tests ==="
# Single host
input="vm-115:10.10.0.88"
IFS=',' read -ra entries <<<"$input"
label="${entries[0]%%:*}"
addr="${entries[0]#*:}"
if [[ "$label" == "vm-115" && "$addr" == "10.10.0.88" ]]; then
pass "--hosts single entry parsed: $label $addr"
else
fail "--hosts single" "expected 'vm-115 10.10.0.88', got: '$label $addr'"
fi
# Multiple hosts
input="vm-115:10.10.0.88,lxc-225:10.10.0.225"
IFS=',' read -ra entries <<<"$input"
label1="${entries[0]%%:*}"
addr1="${entries[0]#*:}"
label2="${entries[1]%%:*}"
addr2="${entries[1]#*:}"
if [[ "$label1" == "vm-115" && "$addr1" == "10.10.0.88" && "$label2" == "lxc-225" && "$addr2" == "10.10.0.225" ]]; then
pass "--hosts multi entry parsed: $label1 $addr1, $label2 $addr2"
else
fail "--hosts multi" "unexpected parse result"
fi
echo ""
echo "=== Results: $PASS passed, $FAIL failed ==="
((FAIL == 0))

View File

@ -178,7 +178,7 @@ When merging many PRs at once (e.g., batch pagination PRs), branch protection ru
| `LOG_LEVEL` | Logging verbosity (default: INFO) |
| `DATABASE_TYPE` | `postgresql` |
| `POSTGRES_HOST` | Container name of PostgreSQL |
| `POSTGRES_DB` | Database name (`pd_master`) |
| `POSTGRES_DB` | Database name `pd_master` (prod) / `paperdynasty_dev` (dev) |
| `POSTGRES_USER` | DB username |
| `POSTGRES_PASSWORD` | DB password |
@ -189,4 +189,6 @@ When merging many PRs at once (e.g., batch pagination PRs), branch protection ru
| Database API (prod) | `ssh akamai` | `pd_api` | 815 |
| Database API (dev) | `ssh pd-database` | `dev_pd_database` | 813 |
| PostgreSQL (prod) | `ssh akamai` | `pd_postgres` | 5432 |
| PostgreSQL (dev) | `ssh pd-database` | `pd_postgres` | 5432 |
| PostgreSQL (dev) | `ssh pd-database` | `sba_postgres` | 5432 |
**Dev database credentials:** container `sba_postgres`, database `paperdynasty_dev`, user `sba_admin`. Prod uses `pd_postgres`, database `pd_master`.

View File

@ -0,0 +1,170 @@
---
title: "Discord Bot Browser Testing via Playwright + CDP"
description: "Step-by-step workflow for automated Discord bot testing using Playwright connected to Brave browser via Chrome DevTools Protocol. Covers setup, slash command execution, and screenshot capture."
type: runbook
domain: paper-dynasty
tags: [paper-dynasty, discord, testing, playwright, automation]
---
# Discord Bot Browser Testing via Playwright + CDP
Automated testing of Paper Dynasty Discord bot commands by connecting Playwright to a running Brave browser instance with Discord open.
## Prerequisites
- Brave browser installed (`brave-browser-stable`)
- Playwright installed (`pip install playwright && playwright install chromium`)
- Discord logged in via browser (not desktop app)
- Discord bot running (locally via docker-compose or on remote host)
- Bot's `API_TOKEN` must match the target API environment
## Setup
### 1. Launch Brave with CDP enabled
Brave must be started with `--remote-debugging-port`. If Brave is already running, **kill it first** — otherwise the flag is ignored and the new process merges into the existing one.
```bash
killall brave && sleep 2 && brave-browser-stable --remote-debugging-port=9222 &
```
### 2. Verify CDP is responding
```bash
curl -s http://localhost:9222/json/version | python3 -m json.tool
```
Should return JSON with `Browser`, `webSocketDebuggerUrl`, etc.
### 3. Open Discord in browser
Navigate to `https://discord.com/channels/<server_id>/<channel_id>` in Brave.
**Paper Dynasty test server:**
- Server: Cals Test Server (`669356687294988350`)
- Channel: #pd-game-test (`982850262903451658`)
- URL: `https://discord.com/channels/669356687294988350/982850262903451658`
### 4. Verify bot is running with correct API token
```bash
# Check docker-compose.yml has the right API_TOKEN for the target environment
grep API_TOKEN /mnt/NV2/Development/paper-dynasty/discord-app/docker-compose.yml
# Dev API token lives on the dev host:
ssh pd-database "docker exec sba_postgres psql -U sba_admin -d paperdynasty_dev -c \"SELECT 1;\""
# Restart bot if token was changed:
cd /mnt/NV2/Development/paper-dynasty/discord-app && docker compose up -d
```
## Running Commands
### Find the Discord tab
```python
from playwright.sync_api import sync_playwright
import time
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp('http://localhost:9222')
for ctx in browser.contexts:
for page in ctx.pages:
if 'discord' in page.url.lower():
print(f'Found: {page.url}')
break
browser.close()
```
### Execute a slash command and capture result
```python
from playwright.sync_api import sync_playwright
import time
def run_slash_command(command: str, wait_seconds: int = 5, screenshot_path: str = '/tmp/discord_result.png'):
"""
Type a slash command in Discord, select the top autocomplete option,
submit it, wait for the bot response, and take a screenshot.
"""
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp('http://localhost:9222')
for ctx in browser.contexts:
for page in ctx.pages:
if 'discord' in page.url.lower():
msg_box = page.locator('[role="textbox"][data-slate-editor="true"]')
msg_box.click()
time.sleep(0.3)
# Type the command (delay simulates human typing for autocomplete)
msg_box.type(command, delay=80)
time.sleep(2)
# Tab selects the top autocomplete option
page.keyboard.press('Tab')
time.sleep(1)
# Enter submits the command
page.keyboard.press('Enter')
time.sleep(wait_seconds)
page.screenshot(path=screenshot_path)
print(f'Screenshot saved to {screenshot_path}')
break
browser.close()
# Example usage:
run_slash_command('/refractor status')
```
### Commands with parameters
After pressing Tab to select the command, Discord shows an options panel. To fill parameters:
1. The first parameter input is auto-focused after Tab
2. Type the value, then Tab to move to the next parameter
3. Press Enter when ready to submit
```python
# Example: /refractor status with tier filter
msg_box.type('/refractor status', delay=80)
time.sleep(2)
page.keyboard.press('Tab') # Select command from autocomplete
time.sleep(1)
# Now fill parameters if needed, or just submit
page.keyboard.press('Enter')
```
## Key Selectors
| Element | Selector |
|---------|----------|
| Message input box | `[role="textbox"][data-slate-editor="true"]` |
| Autocomplete popup | `[class*="autocomplete"]` |
## Gotchas
- **Brave must be killed before relaunch** — if an instance is already running, `--remote-debugging-port` is silently ignored
- **Bot token mismatch** — the bot's `API_TOKEN` in `docker-compose.yml` must match the target API (dev or prod). Symptoms: `{"detail":"Unauthorized"}` in bot logs
- **Viewport is None** — when connecting via CDP, `page.viewport_size` returns None. Use `page.evaluate('() => ({w: window.innerWidth, h: window.innerHeight})')` instead
- **Autocomplete timing** — typing too fast may not trigger Discord's autocomplete. The `delay=80` on `msg_box.type()` simulates human speed
- **Multiple bots** — if multiple bots register the same slash command (e.g. MantiTestBot and PucklTestBot), Tab selects the top option. Verify the correct bot name in the autocomplete popup before proceeding
## Test Plan Reference
The Refractor integration test plan is at:
`discord-app/tests/refractor-integration-test-plan.md`
Key test case groups:
- REF-01 to REF-06: Tier badges and display
- REF-10 to REF-15: Progress bars and filtering
- REF-40 to REF-42: Cross-command badges (card, roster)
- REF-70 to REF-72: Cross-command badge propagation (the current priority)
## Verified On
- **Date:** 2026-04-06
- **Browser:** Brave 146.0.7680.178 (Chromium-based)
- **Playwright:** Node.js driver via Python sync API
- **Bot:** MantiTestBot on Cals Test Server, #pd-game-test channel
- **API:** pddev.manticorum.com (dev environment)

View File

@ -0,0 +1,107 @@
---
title: "Refractor In-App Test Plan"
description: "Comprehensive manual test plan for the Refractor card evolution system — covers /refractor status, tier badges, post-game hooks, tier-up notifications, card art tiers, and known issues."
type: guide
domain: paper-dynasty
tags: [paper-dynasty, testing, refractor, discord, database]
---
# Refractor In-App Test Plan
Manual test plan for the Refractor (card evolution) system. All testing targets **dev** environment (`pddev.manticorum.com` / dev Discord bot).
## Prerequisites
- Dev bot running on `sba-bots`
- Dev API at `pddev.manticorum.com` (port 813)
- Team with seeded refractor data (team 31 from prior session)
- At least one game playable to trigger post-game hooks
---
## REF-10: `/refractor status` — Basic Display
| # | Test | Steps | Expected |
|---|---|---|---|
| 10 | No filters | `/refractor status` | Ephemeral embed with team branding, tier summary line, 10 cards sorted by tier DESC, pagination buttons if >10 cards |
| 11 | Card type filter | `/refractor status card_type:Batter` | Only batter cards shown, count matches |
| 12 | Tier filter | `/refractor status tier:T2—Refractor` | Only T2 cards, embed color changes to tier color |
| 13 | Progress filter | `/refractor status progress:Close to next tier` | Only cards >=80% to next threshold, fully evolved excluded |
| 14 | Combined filters | `/refractor status card_type:Batter tier:T1—Base Chrome` | Intersection of both filters |
| 15 | Empty result | `/refractor status tier:T4—Superfractor` (if none exist) | "No cards match your filters..." message with filter details |
## REF-20: `/refractor status` — Pagination
| # | Test | Steps | Expected |
|---|---|---|---|
| 20 | Page buttons appear | `/refractor status` with >10 cards | Prev/Next buttons visible |
| 21 | Next page | Click `Next >` | Page 2 shown, footer updates to "Page 2/N" |
| 22 | Prev page | From page 2, click `< Prev` | Back to page 1 |
| 23 | First page prev | On page 1, click `< Prev` | Nothing happens / stays on page 1 |
| 24 | Last page next | On last page, click `Next >` | Nothing happens / stays on last page |
| 25 | Button timeout | Wait 120s after command | Buttons become unresponsive |
| 26 | Wrong user clicks | Another user clicks buttons | Silently ignored |
## REF-30: Tier Badges in Card Embeds
| # | Test | Steps | Expected |
|---|---|---|---|
| 30 | T0 card display | View a T0 card via `/myteam` or `/roster` | No badge prefix, just player name |
| 31 | T1 badge | View a T1 card | Title shows `[BC] Player Name` |
| 32 | T2 badge | View a T2 card | Title shows `[R] Player Name` |
| 33 | T3 badge | View a T3 card | Title shows `[GR] Player Name` |
| 34 | T4 badge | View a T4 card (if exists) | Title shows `[SF] Player Name` |
| 35 | Badge in pack open | Open a pack with an evolved card | Badge appears in pack embed |
| 36 | API down gracefully | (hard to test) | Card displays normally with no badge, no error |
## REF-50: Post-Game Hook & Tier-Up Notifications
| # | Test | Steps | Expected |
|---|---|---|---|
| 50 | Game completes normally | Play a full game | No errors in bot logs; refractor evaluate-game fires after season-stats update |
| 51 | Tier-up notification | Play game where a card crosses a threshold | Embed in game channel: "Refractor Tier Up!", player name, tier name, correct color |
| 52 | No tier-up | Play game where no thresholds crossed | No refractor embed posted, game completes normally |
| 53 | Multiple tier-ups | Game where 2+ players tier up | One embed per tier-up, all posted |
| 54 | Auto-init new card | Play game with a card that has no RefractorCardState | State created automatically, player evaluated, no error |
| 55 | Superfractor notification | (may need forced data) | "SUPERFRACTOR!" title, teal color |
## REF-60: Card Art with Tiers (API-level)
| # | Test | Steps | Expected |
|---|---|---|---|
| 60 | T0 card image | `GET /api/v2/players/{id}/card-image?card_type=batting` | Base card, no tier styling |
| 61 | Tier override | `GET ...?card_type=batting&tier=2` | Refractor styling visible (border, diamond indicator) |
| 62 | Each tier visual | `?tier=1` through `?tier=4` | Correct border colors, diamond fill, header gradients per tier |
| 63 | Pitcher card | `?card_type=pitching&tier=2` | Tier styling applies correctly to pitcher layout |
## REF-70: Known Issues to Verify
| # | Issue | Check | Status |
|---|---|---|---|
| 70 | Superfractor embed says "Rating boosts coming in a future update!" | Verify — boosts ARE implemented now, text is stale | **Fix needed** |
| 71 | `on_timeout` doesn't edit message | Buttons stay visually active after 120s | **Known, low priority** |
| 72 | Card embed perf (1 API call per card) | Note latency on roster views with 10+ cards | **Monitor** |
| 73 | Season-stats failure kills refractor eval | Both in same try/except | **Known risk, verify logging** |
---
## API Endpoints Under Test
| Method | Endpoint | Used By |
|---|---|---|
| GET | `/api/v2/refractor/tracks` | Track listing |
| GET | `/api/v2/refractor/cards?team_id=X` | `/refractor status` command |
| GET | `/api/v2/refractor/cards/{card_id}` | Tier badge in card embeds |
| POST | `/api/v2/refractor/cards/{card_id}/evaluate` | Force re-evaluation |
| POST | `/api/v2/refractor/evaluate-game/{game_id}` | Post-game hook |
| GET | `/api/v2/teams/{team_id}/refractors` | Teams alias endpoint |
| GET | `/api/v2/players/{id}/card-image?tier=N` | Card art tier preview |
## Notification Embed Colors
| Tier | Name | Color |
|---|---|---|
| T1 | Base Chrome | Green (0x2ECC71) |
| T2 | Refractor | Gold (0xF1C40F) |
| T3 | Gold Refractor | Purple (0x9B59B6) |
| T4 | Superfractor | Teal (0x1ABC9C) |

View File

@ -12,5 +12,5 @@ ostype: l26
scsi0: local-lvm:vm-115-disk-0,size=256G
scsihw: virtio-scsi-pci
smbios1: uuid=19be98ee-f60d-473d-acd2-9164717fcd11
sockets: 2
sockets: 1
vmgenid: 682dfeab-8c63-4f0b-8ed2-8828c2f808ef