claude-home/development/docker-buildx-cache-400-error.md
Cal Corum 36aa78e591
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 4s
docs: sync KB — docker-buildx-cache-400-error.md
2026-03-23 22:00:43 -05:00

117 lines
4.0 KiB
Markdown

---
title: "Fix: Docker buildx cache 400 error — migrated to local volume cache"
description: "Registry buildx cache caused 400 errors; permanent fix is local volume cache on the Gitea Actions runner."
type: troubleshooting
domain: development
tags: [troubleshooting, docker, gitea, ci]
---
# Fix: Docker buildx cache 400 error on CI builds
**Date:** 2026-03-23
**Severity:** Medium — blocks CI/CD Docker image builds, requires manual intervention to retrigger
## Problem
Gitea Actions Docker build workflow fails at the "exporting cache to registry" step with:
```
error writing layer blob: failed to copy: unexpected status from PUT request to
https://registry-1.docker.io/v2/.../blobs/uploads/...: 400 Bad request
```
The image never gets pushed to Docker Hub. Seen on both Paper Dynasty and Major Domo repos.
## Root Cause
Stale `buildx_buildkit_builder-*` containers accumulate on the Gitea Actions runner host. Each CI build creates a new buildx builder instance but doesn't always clean up. Over time, these stale builders corrupt the registry cache state, causing Docker Hub to reject cache export PUT requests with 400.
## Fix
Kill all stale buildx builder containers on the runner, then retrigger the build:
```bash
# Kill stale builders
ssh gitea "docker rm -f \$(docker ps -a --format '{{.Names}}' | grep buildx_buildkit_builder)"
# Retrigger by deleting and re-pushing the tag
git push origin :refs/tags/<tag> && git push origin <tag>
```
## Lessons
- `type=registry` cache is unreliable on a single-runner setup — stale builders accumulate and corrupt cache state
- Killing stale builders is a temporary fix only
---
## Permanent Fix: Local Volume Buildx Cache (2026-03-24)
**Severity:** N/A — preventive infrastructure change
**Problem:** The `type=registry` cache kept failing with 400 errors. Killing stale builders was a manual band-aid.
**Root Cause:** Each CI build creates a new buildx builder container. On a single persistent runner (`gitea/act_runner`, `--restart unless-stopped`), these accumulate and corrupt the Docker Hub registry cache.
**Fix:** Switched all workflows from `type=registry` to `type=local` backed by a named Docker volume.
### Setup (one-time, on gitea runner host)
```bash
# Create named volume
docker volume create pd-buildx-cache
# Update /etc/gitea/runner-config.yaml
# valid_volumes:
# - pd-buildx-cache
# Recreate runner container with new volume mount
docker run -d --name gitea-runner --restart unless-stopped \
-v /etc/gitea/runner-config.yaml:/config.yaml:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v gitea-runner-data:/data \
-v pd-buildx-cache:/opt/buildx-cache \
gitea/act_runner:latest
```
### Workflow changes
1. Add `container.volumes` to mount the named volume into job containers:
```yaml
jobs:
build:
runs-on: ubuntu-latest
container:
volumes:
- pd-buildx-cache:/opt/buildx-cache
```
2. Replace cache directives (each repo uses its own subdirectory):
```yaml
cache-from: type=local,src=/opt/buildx-cache/<repo-name>
cache-to: type=local,dest=/opt/buildx-cache/<repo-name>-new,mode=max
```
3. Add cache rotation step (prevents unbounded growth):
```yaml
- name: Rotate cache
run: |
rm -rf /opt/buildx-cache/<repo-name>
mv /opt/buildx-cache/<repo-name>-new /opt/buildx-cache/<repo-name>
```
### Key details
- `type=gha` does NOT work on Gitea act_runner (requires GitHub's cache service API)
- Named volumes (not bind mounts) are required because job containers are sibling containers spawned via Docker socket
- `mode=max` caches all intermediate layers, not just final — important for multi-stage builds
- First build after migration is cold; subsequent builds hit local cache
- Cache size is bounded by the rotation step (~200-600MB per repo)
- Applied to: Paper Dynasty database, Paper Dynasty discord. Major Domo repos still use registry cache (follow-up)
### Repos using local cache
| Repo | Cache subdirectory |
|---|---|
| paper-dynasty-database | `/opt/buildx-cache/pd-database` |
| paper-dynasty-discord | `/opt/buildx-cache/pd-discord` |