docs: sync KB — docker-buildx-cache-400-error.md
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 4s

This commit is contained in:
Cal Corum 2026-03-23 22:00:43 -05:00
parent 7bea39b39b
commit 36aa78e591

View File

@ -1,6 +1,6 @@
---
title: "Fix: Docker buildx cache 400 error on CI builds"
description: "Stale buildx_buildkit_builder containers on Gitea Actions runner cause 400 Bad Request when exporting cache to Docker Hub."
title: "Fix: Docker buildx cache 400 error — migrated to local volume cache"
description: "Registry buildx cache caused 400 errors; permanent fix is local volume cache on the Gitea Actions runner."
type: troubleshooting
domain: development
tags: [troubleshooting, docker, gitea, ci]
@ -40,6 +40,77 @@ git push origin :refs/tags/<tag> && git push origin <tag>
## Lessons
- Monitor buildx builder container accumulation on the Gitea runner — if more than 2-3 are lingering, clean them up proactively
- Consider adding a cleanup step to the CI workflow that prunes old builders after successful builds
- The `cache-to: type=registry` directive in the workflow is the trigger — without registry caching this wouldn't happen, but removing it would slow builds significantly
- `type=registry` cache is unreliable on a single-runner setup — stale builders accumulate and corrupt cache state
- Killing stale builders is a temporary fix only
---
## Permanent Fix: Local Volume Buildx Cache (2026-03-24)
**Severity:** N/A — preventive infrastructure change
**Problem:** The `type=registry` cache kept failing with 400 errors. Killing stale builders was a manual band-aid.
**Root Cause:** Each CI build creates a new buildx builder container. On a single persistent runner (`gitea/act_runner`, `--restart unless-stopped`), these accumulate and corrupt the Docker Hub registry cache.
**Fix:** Switched all workflows from `type=registry` to `type=local` backed by a named Docker volume.
### Setup (one-time, on gitea runner host)
```bash
# Create named volume
docker volume create pd-buildx-cache
# Update /etc/gitea/runner-config.yaml
# valid_volumes:
# - pd-buildx-cache
# Recreate runner container with new volume mount
docker run -d --name gitea-runner --restart unless-stopped \
-v /etc/gitea/runner-config.yaml:/config.yaml:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v gitea-runner-data:/data \
-v pd-buildx-cache:/opt/buildx-cache \
gitea/act_runner:latest
```
### Workflow changes
1. Add `container.volumes` to mount the named volume into job containers:
```yaml
jobs:
build:
runs-on: ubuntu-latest
container:
volumes:
- pd-buildx-cache:/opt/buildx-cache
```
2. Replace cache directives (each repo uses its own subdirectory):
```yaml
cache-from: type=local,src=/opt/buildx-cache/<repo-name>
cache-to: type=local,dest=/opt/buildx-cache/<repo-name>-new,mode=max
```
3. Add cache rotation step (prevents unbounded growth):
```yaml
- name: Rotate cache
run: |
rm -rf /opt/buildx-cache/<repo-name>
mv /opt/buildx-cache/<repo-name>-new /opt/buildx-cache/<repo-name>
```
### Key details
- `type=gha` does NOT work on Gitea act_runner (requires GitHub's cache service API)
- Named volumes (not bind mounts) are required because job containers are sibling containers spawned via Docker socket
- `mode=max` caches all intermediate layers, not just final — important for multi-stage builds
- First build after migration is cold; subsequent builds hit local cache
- Cache size is bounded by the rotation step (~200-600MB per repo)
- Applied to: Paper Dynasty database, Paper Dynasty discord. Major Domo repos still use registry cache (follow-up)
### Repos using local cache
| Repo | Cache subdirectory |
|---|---|
| paper-dynasty-database | `/opt/buildx-cache/pd-database` |
| paper-dynasty-discord | `/opt/buildx-cache/pd-discord` |