Investigate manticore zombie processes and swap usage #30

Closed
opened 2026-04-03 01:10:02 +00:00 by cal · 0 comments
Owner

Context

The audit found 10 zombie processes and 978 MB swap usage on manticore (ubuntu-manticore, 32 GB RAM physical server). While neither is critical, the combination suggests potential past memory pressure events.

Investigation Tasks

  • Identify the zombie processes: ps aux | awk '$8=="Z"' — what parent process is not reaping them?
  • Check OOM kill history: dmesg | grep -i "oom-kill" and journalctl -k | grep -i oom
  • Check what's consuming swap: smem -rs swap | head -20 or for pid in /proc/[0-9]*/status; do awk '/VmSwap/{if($2>0) print FILENAME,$0}' $pid; done 2>/dev/null | sort -k3 -rn | head -20
  • Determine if swap usage is residual (allocated once, now idle) or active (si/so in vmstat)
  • Check if Tdarr (122% CPU during audit) is the parent of the zombies
  • If zombies are from Tdarr worker processes, may need a Tdarr restart or config fix

Potential Actions

  • If zombies are from a known parent: fix the parent or add a periodic cleanup
  • If swap is residual: consider swapoff -a && swapon -a during next maintenance
  • If OOM kills found: investigate which service caused them and whether memory limits need adjusting

Labels

infra-audit, operations

## Context The audit found **10 zombie processes** and **978 MB swap usage** on manticore (ubuntu-manticore, 32 GB RAM physical server). While neither is critical, the combination suggests potential past memory pressure events. ## Investigation Tasks - [ ] Identify the zombie processes: `ps aux | awk '$8=="Z"'` — what parent process is not reaping them? - [ ] Check OOM kill history: `dmesg | grep -i "oom-kill"` and `journalctl -k | grep -i oom` - [ ] Check what's consuming swap: `smem -rs swap | head -20` or `for pid in /proc/[0-9]*/status; do awk '/VmSwap/{if($2>0) print FILENAME,$0}' $pid; done 2>/dev/null | sort -k3 -rn | head -20` - [ ] Determine if swap usage is residual (allocated once, now idle) or active (si/so in vmstat) - [ ] Check if Tdarr (122% CPU during audit) is the parent of the zombies - [ ] If zombies are from Tdarr worker processes, may need a Tdarr restart or config fix ## Potential Actions - If zombies are from a known parent: fix the parent or add a periodic cleanup - If swap is residual: consider `swapoff -a && swapon -a` during next maintenance - If OOM kills found: investigate which service caused them and whether memory limits need adjusting ## Labels `infra-audit`, `operations`
cal added the
infra-audit
operations
labels 2026-04-03 01:10:23 +00:00
cal closed this issue 2026-04-03 02:05:47 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/claude-home#30
No description provided.