Agent Protocol
Most users never need to understand the wire protocol. But when debugging connectivity, planning network architecture, or evaluating the security model, knowing how it works matters.
The basics
Section titled “The basics”- Transport: WebSocket over TLS (
wss://) - Authentication: Mutual TLS (both sides present certs)
- Direction: Outbound-only from the agent
- Wire format: JSON frames inside the WebSocket — one frame envelope, request/response correlated by ID
Certificate authority
Section titled “Certificate authority”Every dockmesh server has its own internal CA. On first boot it generates an ECDSA P-256 CA keypair and a matching server leaf cert. Both land as PEM files in the data directory next to the SQLite DB:
| File | Purpose |
|---|---|
agents-ca.crt | CA public certificate, 10-year validity |
agents-ca.key | CA private key (mode 0400) |
agents-server.crt | Server cert for the :8443 listener, 5-year validity |
agents-server.key | Matching server private key (mode 0400) |
No separate database encryption layer protects the CA key — its security relies on filesystem permissions (0700 on the data dir + 0400 on the key file). Back up the data directory to back up the CA.
Enrolment
Section titled “Enrolment”When you add a host:
- Server generates a one-time bootstrap token (cryptographically random, hashed before storage) and stores it on the new agent row in
pendingstatus. - The UI shows the install command —
curl -fsSL https://<server>/install/agent.sh?token=<token> | sudo bash— which you paste on the target host. - The install script fetches the agent binary from the same server, drops a systemd unit, and starts the agent.
- The agent calls the enrolment endpoint with the token. The server matches the hash, sees
pending, generates an ECDSA P-256 client keypair, signs a cert with the CA (1-year validity, agent’s UUID as Common Name), and returns cert + key + CA bundle. - The agent writes those into its data dir as
agent.crt,agent.key,ca.crt, plus the dial URL intoagent.url. - The token is invalidated server-side. The agent now uses its client cert for every subsequent connection.
Token TTL: one-time use, no time-based expiry. If you don’t use it in the next 10 minutes you can still use it next week. Rotate it manually if you suspect it leaked (Agents page → host → Rotate enrolment token).
Connection lifecycle
Section titled “Connection lifecycle”The agent dials wss://<server>:8443/connect as soon as its service starts:
- TLS handshake — agent presents
agent.crt, server verifies against the CA, server presentsagents-server.crtwhich the agent verifies against the pinnedca.crt. - WebSocket upgrade — HTTP/1.1 Upgrade.
agent.hello— agent announces itself with its UUID, version, hostname, Docker version, OS info.server.welcome— server records the connection, marks the host online.- Ready — request frames flow.
On any disconnect the agent reconnects with exponential backoff. Server-side, a pingInterval of 30 seconds + a heartbeatGrace of 60 seconds mean the host flips to offline ~60s after the agent stops responding.
Message framing
Section titled “Message framing”The wire format is JSON, not binary. Every frame looks like:
{ "type": "req.containers.list", "id": "8d21e3f0…", "payload": { /* type-specific JSON */ }}type— string identifier (e.g.agent.hello,req.containers.list,res.containers.list).id— opaque correlation id; responses echo the matching request’s id so the server’s waiting goroutine can route the reply.payload— JSON value; shape depends ontype.
The choice of JSON-over-WebSocket (rather than gRPC or a binary protocol) is deliberate: easier to inspect with Wireshark or a proxy, simpler to debug from journalctl, no codegen step. The per-message overhead is negligible for the kind of small control-plane traffic dockmesh sends.
Multiplexing
Section titled “Multiplexing”A single WebSocket carries many concurrent operations. There is no explicit “stream id” channel — multiplexing happens at the frame level: request/response frames share the connection, and longer-running streams (log tails, exec sessions, stats subscriptions) are layered on by sending frames with the same id until the requester sends a close frame.
When you close a log viewer tab in the UI the server sends a close frame; the agent stops tailing the container’s log.
Frame types
Section titled “Frame types”dockmesh has ~40 frame types covering lifecycle + every operation. A few representative ones:
| Frame | Direction | Purpose |
|---|---|---|
agent.hello | agent → server | Initial announcement on connect |
server.welcome | server → agent | Acknowledgement + initial config |
agent.heartbeat | agent → server | Sent in response to server.ping |
server.ping | server → agent | Every 30s; missing replies for 60s flip the host offline |
req.containers.list | server → agent | Ask the agent for its container list |
req.containers.start | server → agent | Lifecycle action |
req.images.list | server → agent | Resource listing |
req.volume.browse | server → agent | List one directory inside a named volume |
req.deploy.stack | server → agent | Apply a compose project |
req.agent.upgrade | server → agent | Hot-swap the agent binary |
The full list lives in internal/agents/protocol.go in the source tree — that file is the single source of truth.
Cert rotation
Section titled “Cert rotation”Client certs are issued for 1 year. There is no auto-renewal: the agent does not currently send a renewal request as expiry approaches. Long-running agents need to be re-enrolled manually before their cert expires — rotate the token, run the install one-liner again, the cert is replaced. An agent with an expired cert sees TLS handshake failures and is marked offline; no silent breakage. Auto-renewal is on the roadmap.
Revocation
Section titled “Revocation”When you remove a host from dockmesh, the agent row is deleted (or marked revoked). On the next handshake the server checks the row in the database and refuses the connection. The agent logs the rejection and exits. No formal CRL or OCSP — the check is a single DB lookup per connection attempt, takes effect immediately.
Agent upgrade
Section titled “Agent upgrade”The upgrade controller (see Upgrade Guide) dispatches a req.agent.upgrade frame to drifted agents with the download URL + checksum. The agent downloads the new binary from <server>/install/dockmesh-agent-linux-<arch>, verifies the SHA, writes it atomically, and re-executes. Open operations (log tails, exec) are interrupted but the UI’s WebSocket clients reconnect automatically and resume.
Why outbound-only?
Section titled “Why outbound-only?”Tradition says “server-to-agent” models pull from a central server. dockmesh flips this for practical reasons:
- No inbound firewall holes on agent hosts (NAT, corporate firewalls, home routers all work unchanged)
- Works behind CGNAT (common in home labs)
- Works from anywhere the agent has internet access (coffee shop, traveling laptop, edge locations)
- Agent knows when it has Docker running — no need for server to poll
The tradeoff: the server is reachable to all agents, which means protecting it matters. See Hardening.
Why not gRPC?
Section titled “Why not gRPC?”dockmesh deliberately picked JSON-over-WebSocket over gRPC:
- WebSocket survives corporate proxies that strip HTTP/2 — gRPC does not always
- Easier debugging —
journalctl, Wireshark, browser DevTools, anything that reads text speaks the protocol - Smaller agent binary (no gRPC runtime)
- Performance is more than sufficient for the small control-plane messages dockmesh sends; binary serialisation would be over-engineering at this scale
Troubleshooting protocol issues
Section titled “Troubleshooting protocol issues”See Troubleshooting → Agent won’t connect.
For deep debugging, enable debug logging:
# On the agentsystemctl set-environment DOCKMESH_LOG_LEVEL=debugsystemctl restart dockmesh-agentjournalctl -u dockmesh-agent -fSee also
Section titled “See also”- Agent mTLS — certificate management
- Multi-Host — user-facing multi-host features
- Hardening — server-side protection