Skip to content

Upgrade Guide

dockmesh follows semantic versioning. Minor and patch releases are drop-in upgrades. Major versions may require manual steps — always check the changelog before a major upgrade.

  1. Read the changelog for your target version
  2. Back up the data directory
  3. Stop the dockmesh server
  4. Replace the binary
  5. Start the dockmesh server
  6. Verify agents reconnect
  7. Verify all stacks are healthy

Typical total downtime: 30-60 seconds for the UI. Your containers keep running the whole time — the dockmesh server being down doesn’t affect Docker.

Terminal window
# Stop-the-world snapshot of the database + CA
systemctl stop dockmesh
tar czf /opt/dockmesh-backup-$(date +%Y%m%d).tar.gz /opt/dockmesh/data/

Copy the tarball somewhere off the host (S3, another server, your workstation).

If the upgrade fails, restore by extracting the tarball and using the previous binary.

Terminal window
# Check current version
/usr/local/bin/dockmesh --version
# Download new binary to a temp path
curl -fsSL https://get.dockmesh.dev/v1.2.0/linux-amd64 -o /tmp/dockmesh-new
# Verify checksum
sha256sum /tmp/dockmesh-new
# Compare to the published checksum in the release notes
# Make executable
chmod +x /tmp/dockmesh-new
Terminal window
# Move old binary aside
mv /usr/local/bin/dockmesh /usr/local/bin/dockmesh.old
# Install new
mv /tmp/dockmesh-new /usr/local/bin/dockmesh
# Start
systemctl start dockmesh
# Verify logs
journalctl -u dockmesh -f

The server should log:

dockmesh v1.2.0 starting
Database migration from 15 -> 17 complete
HTTP listening on :8080
Agent listener on :8443 (mTLS)

If migrations fail, the server exits. Roll back:

Terminal window
systemctl stop dockmesh
mv /usr/local/bin/dockmesh.old /usr/local/bin/dockmesh
# Restore DB if migrations partially ran
tar xzf /opt/dockmesh-backup-*.tar.gz -C /
systemctl start dockmesh

When an agent reconnects after the server starts, it reports its version. If the server’s new version ships an updated agent protocol, the server instructs the agent to download and hot-swap its binary.

Agent hot-swap:

  1. Server sends upgrade message with the new agent URL + checksum
  2. Agent downloads, verifies, writes new binary atomically
  3. Agent re-executes itself with the new binary (open connections survive via the systemd socket fd)
  4. Server confirms new version in the Hosts page

Typical agent upgrade: < 10 seconds, no stack restart.

Agents on hosts that can’t reach the download URL (air-gapped) show a “pending upgrade” banner. You upgrade them manually:

Terminal window
# On the air-gapped host
curl -fsSL https://get.dockmesh.dev/v1.2.0/agent-linux-amd64 -o /usr/local/bin/dockmesh-agent.new
chmod +x /usr/local/bin/dockmesh-agent.new
mv /usr/local/bin/dockmesh-agent.new /usr/local/bin/dockmesh-agent
systemctl restart dockmesh-agent

Quick checks:

  • Dashboard loads, shows expected host count
  • All hosts are Online (Hosts page, status column)
  • No unexpected alerts fired during the upgrade window
  • Spot-check one stack — open it, deploy something minor, verify it works
  • Check the audit log for migration events (new version often adds new audit categories)

If you have many agents and want to upgrade them in waves instead of all at once, configure the fleet-wide policy on the Agents page (top panel):

  • Manual (default) — agents stay on their current version until an admin clicks Upgrade on each row. Safest; what existing installs get on first boot after this feature shipped.
  • Auto — every drifted agent is sent a FrameReqAgentUpgrade on the next evaluator tick. Good for homelab / small fleets where blast-radius is small.
  • Staged — the controller upgrades a configurable percentage of drifted agents per tick (default 10%), so you can watch the first wave before the rest follow.

The evaluator runs every 60 seconds. When you change the mode or bump the server version, hit Run now on the policy panel to trigger an immediate evaluation instead of waiting for the next tick.

Every drifted agent also gets a dedicated Upgrade button on its row in the Agents page — useful regardless of mode when a single host fell behind after a network blip.

The evaluator compares each connected agent’s Hello.Version (reported on every WS connect) to pkg/version.Version compiled into the server binary. String equality — no semver comparison. That keeps the logic simple: after a server upgrade, every agent is “drifted” until it reports the new version in its next hello, and normal evaluation takes over.

Every dispatched upgrade is logged with action stack.deploy and a action=agent-upgrade detail field, including the target agent id, the server’s version, and which binary was requested. Policy changes log as agent.upgrade_policy. Manual runs log as agent.upgrade_run.

Downgrades are not supported across database migrations. The server refuses to start if the DB schema is newer than the binary expects.

If you must downgrade:

  1. Stop the server
  2. Restore the pre-upgrade backup
  3. Install the older binary
  4. Start

This loses any changes made after the upgrade.

For v1.x → v2.0-style jumps:

  • Read the migration notes section of the release post
  • Test on a staging dockmesh first (point a throwaway instance at a subset of hosts)
  • Plan for 15-30 min downtime
  • Have the rollback procedure pre-tested