Upgrade runbook
gocdnext follows semver loosely: minor bumps (0.5.x → 0.6.0)
land database migrations + new features but stay backward-
compatible at the YAML/API surface. Every release tag has notes
flagging breaking changes specifically.
This page walks the canonical Helm upgrade. Adapt for your own
deployment automation (Argo CD, Flux, Pulumi, …) — the steps don’t
change, only how helm upgrade is invoked.
Before you upgrade
1. Read the release notes
Every tag at https://github.com/klinux/gocdnext/releases has the diff vs the previous tag, schema migrations listed by number, and any breaking changes called out at the top. Skim before you pull the trigger — there’s usually nothing urgent, but knowing what’s about to change makes triage easier if a job behaves differently.
2. Back up the database
The control plane is stateful — pipelines, runs, log_lines, artefacts metadata, secrets — all live in Postgres. The chart doesn’t run automatic backups; that’s your operator’s call (Velero, pgBackRest, native logical dumps, whatever your platform team uses).
Quick logical dump for emergencies:
kubectl -n gocdnext exec -it gocdnext-postgres-dev-0 -- \ pg_dump -U gocdnext -F c -d gocdnext -f /tmp/gocdnext-pre-upgrade.dumpkubectl -n gocdnext cp gocdnext-postgres-dev-0:/tmp/gocdnext-pre-upgrade.dump \ ./gocdnext-pre-upgrade.dumpFor external Postgres replace with whatever your standard backup
flow is. Restore is pg_restore -d gocdnext gocdnext-pre-upgrade.dump.
3. Pause the agents (optional)
A clean upgrade doesn’t require this — agents reconnect after the
control plane rolls. But if you want to avoid jobs landing
mid-rollout (and possibly being marked running against an agent
the new control plane doesn’t know about yet), scale them to zero
first:
kubectl -n gocdnext scale deployment/gocdnext-agent --replicas=0The reaper picks up any stale running rows after the upgrade
and re-queues them automatically — the timing window where this
matters is small.
The upgrade
helm upgrade gocdnext oci://ghcr.io/klinux/charts/gocdnext \ --version 0.6.4 \ --namespace gocdnext \ --reuse-values--reuse-values keeps every override you set on previous installs.
If you want to change something at the same time, pass --set or
-f values-prod.yaml instead.
What this does:
- Pulls the chart from the OCI registry.
- Rolls the control plane Deployment first. The new pod boots,
runs goose migrations on the database, then accepts traffic.
Old pod doesn’t terminate until the new pod’s
readinessProbepasses — readiness is gated on the migration completing. - Rolls the agent Deployment(s). New agents register, old ones close their gRPC stream. The session store invalidates the old ones.
- Rolls the web Deployment.
Migration ordering
Migrations run forward-only (no .down.sql in production —
the project doctrine). Each release:
- Lists new migration numbers in the release notes (e.g.
00036,00039). - Migrations are idempotent at goose’s level —
goose upis safe to re-run. - Migrations are designed to be backward-compatible with the previous server version. The new server can drive the new schema; the old server doesn’t BREAK on the new schema either, because we never drop columns or rename without a deprecation cycle.
That posture means a rollback (downgrade the server image to the previous version) is generally safe: the old binary keeps running against the new schema. Specific guidance lives in the release notes — read those before downgrading.
After the upgrade
1. Verify the rollouts
kubectl -n gocdnext get podskubectl -n gocdnext rollout status deployment/gocdnext-serverkubectl -n gocdnext rollout status deployment/gocdnext-agentkubectl -n gocdnext rollout status deployment/gocdnext-webAll Running + Ready. The server’s /healthz returns 200; if
not, check the logs:
kubectl -n gocdnext logs deployment/gocdnext-server --tail=2002. Check the migration trail
Boot logs from the new server show every migration that ran:
goose: applied 00036_runs_has_services.sql in 12msgoose: applied 00037_agents_engine.sql in 4msgoose: applied 00038_idx_job_runs_run_id.sql in 41msgoose: applied 00039_service_runs.sql in 22msgoose: successfully migrated database to version: 39If you see fewer migrations than the release notes mentioned, the DB might already have been at a later state — verify via:
SELECT * FROM goose_db_version ORDER BY id DESC LIMIT 10;3. Smoke-test a run
Trigger a pipeline you trust (the Run latest button on a known- green pipeline). If it goes through, the upgrade is real.
4. Resume agents (if paused)
kubectl -n gocdnext scale deployment/gocdnext-agent --replicas=2Rolling back
If the new version misbehaves and rollback is the right answer:
helm rollback gocdnext --namespace gocdnextThis bumps every Deployment back to the previous chart’s image tags AND keeps the database at the new schema (forward-only). Because migrations are designed to be backward-compatible with the previous server version, the old server will run fine against it — but verify against the release notes for the version you’re rolling back FROM, since that’s where any incompatibility would be flagged.
If a release notes specifically calls out a non-backward-compatible migration (rare, never silent), the rollback path includes a data-fixup query at the top — read it before running.
Going across major versions
0.x → 1.0 (when it lands) will likely include a one-shot
migration that’s not backward-compatible. Treat it as a
maintenance window:
- Take the database backup (above).
- Stop the control plane (
replicas: 0). - Run the migration:
kubectl run --rm -it gocdnext-migrate --image=ghcr.io/klinux/gocdnext-server:1.0.0 -- migrate up. - Start the new server.
- Verify, then resume agents.
This pattern is standard for major DB schema overhauls; the release notes will say which version triggers it. For now, every 0.x release is in-place safe.