Skip to content

Troubleshooting

Start here when something’s not working. If this page doesn’t cover your case, check the FAQ or open a GitHub Discussion.

Host shows Offline or Connecting… indefinitely.

  1. Agent service running?
    Terminal window
    systemctl status dockmesh-agent
    journalctl -u dockmesh-agent -n 100
  2. Can the agent resolve the server DNS?
    Terminal window
    # On the agent host
    getent hosts dockmesh.example.com
  3. Can it reach the server port?
    Terminal window
    openssl s_client -connect dockmesh.example.com:8443 -servername dockmesh.example.com
  4. Is the certificate valid? If you rotated the CA, re-enroll the agent:
    Terminal window
    # Get a new enrollment token from the UI
    dockmesh-agent enroll --server https://dockmesh.example.com --token <new-token>
  5. Clock skew? mTLS is sensitive to clock drift. Both sides should run NTP:
    Terminal window
    timedatectl status
  • Firewall rule change blocking outbound 8443
  • TLS cert expired on server (uncommon, auto-renews if ACME)
  • Agent cert revoked (check Hosts → revoked list on server)
  • Network split between server and agent subnets

Deploy logs show an error and the stack status goes to error.

The streaming deploy log has the real cause. Common ones:

pull access denied — image is private and the registry credentials aren’t configured. See Images → Registry auth.

port is already allocated — another container is using the host port. Find it: Containers → filter by port. Either stop the existing container or change the port in the new stack.

driver failed programming external connectivity — usually means the host ran out of available ports in the ephemeral range, or iptables is misconfigured. Restart the Docker daemon on that host.

network <name> declared as external, but could not be found — the external network isn’t there. Create it first (Networks → New network) or remove external: true.

no space left on device — the host disk is full. Usually /var/lib/docker — prune images/volumes via dockmesh or clean up host logs.

Clicking the SSO button sends you to the IdP, you log in, come back, and see “Authentication failed” or get bounced to the login page.

  1. Redirect URI matches exactly? The URI in your IdP config must match <your-dockmesh-url>/auth/oidc/callback character-for-character. http vs https, trailing slash, port — all must match.
  2. Clock skew? OIDC tokens have short expiry (usually 60s). If server and IdP clocks differ by more than that, tokens are rejected.
  3. Group claim present? If you use group mappings, the ID token must include the groups claim. Some IdPs require enabling “groups scope” explicitly.
  4. Logs on the dockmesh server:
    Terminal window
    journalctl -u dockmesh | grep -i oidc
    Look for specific error like invalid token signature, missing claim, discovery failed.

Pages take seconds to load.

  1. Server load?
    Terminal window
    top # check dockmesh CPU/mem
    iostat # check disk wait
  2. Database size?
    Terminal window
    ls -lh /opt/dockmesh/data/dockmesh.db
    If it’s > 1 GB, consider enabling audit log retention (see Audit Log).
  • Vacuum the SQLite DB if fragmentation is high:
    Terminal window
    sqlite3 /opt/dockmesh/data/dockmesh.db "VACUUM;"
  • Migrate to PostgreSQL for fleets > 200 hosts. Set DOCKMESH_DB_URL=postgres://....
  • Reduce stats retention in Settings if disk I/O is the bottleneck.

Backup job shows failed.

  1. Job log — click the failed run, read the error
  2. Target still reachable? — test in Backups → Targets → [target] → Test connection
  3. Disk space on target — SFTP/NAS with a full disk silently fails
  4. Encryption passphrase known? — restore tests require it; rotating it orphans old backups
  • dial tcp ... i/o timeout — target host is unreachable (firewall? DNS?)
  • permission denied — credentials have read but not write access on target
  • pre-backup hook exited 1 — the hook script failed (check the hook command/image)

Migration aborts partway, stack is back on source host.

  1. Pre-flight — did any check fail? Volume size mismatch is common.
  2. Network — bandwidth between source and destination; migrations of 100+ GB volumes can take hours on slow links
  3. Destination disk full mid-transfer — pre-flight checks free space, but if something else fills it up mid-transfer, migration aborts

Automatic rollback should leave you in the starting state. If it doesn’t, manual cleanup:

Terminal window
# On source
docker compose -f /opt/dockmesh/stacks/<host>/<stack>/compose.yaml up -d
# On destination
docker compose -f /opt/dockmesh/stacks/<host>/<stack>/compose.yaml down
  1. Rule enabled? (Alerts → Rules → check toggle)
  2. Mute rule active? (Alerts → Mutes — any matching mute?)
  3. Channel working?Settings → Channels → [channel] → Send test
  4. Cooldown? — a recent fire for the same resource suppresses re-alerts

Open Container → Logs, nothing shows up or stops after a few seconds.

  • Click Reconnect — WebSocket may have dropped
  • Check agent version on the host (old agents had a streaming bug fixed in 1.0.0-beta.3)
  • If behind a corporate proxy, WebSocket might be stripped — contact your network admin

The login page rejects your credentials, or returns:

account temporarily locked — try again in N minutes

Five failed login attempts in a row trigger a 15-minute lockout per user (default — configurable via auth.lockout_max_attempts and auth.lockout_duration_minutes). This usually comes from:

  • Browser autofill replaying a stale saved password for the same URL
  • Copy-paste from a password manager that got the wrong entry
  • An actual forgotten password
  • Someone on your network probing with wrong credentials (rare for homelab, more relevant on public-internet deploys)
  1. Wait 15 minutes — the lockout is time-based, no admin action needed. The login error tells you how long is left.

  2. If you know the password but the lockout is annoying:

    Terminal window
    sudo dockmesh admin unlock --user admin

    Clears the lockout without touching the password.

  3. If you forgot the password:

    Terminal window
    cd /var/lib/dockmesh # service's working directory
    sudo dockmesh admin reset-password --user admin --password 'NewSecure#2026'

    Also clears the lockout as a side effect.

  4. If the login page rejects you silently (not a lockout error):

    • Delete the saved password for the dockmesh URL in your browser’s password manager, then type the password by hand
    • Try an Incognito/Private window — rules out autofill + cookie issues
  • Set a strong, memorable password you type rather than auto-fill
  • Tune auth.lockout_max_attempts up (e.g. to 10) under Settings → System if you find 5 too strict

If none of the above fixes your issue:

  • GitHub Discussions — searchable, other users can help, answers benefit everyone
  • GitHub Issues — for bugs (include dockmesh version, OS, minimal reproduction)
  • Security issues only: security@dockmesh.dev

Always include:

  • dockmesh version (dockmesh --version)
  • OS + Docker version
  • Relevant log snippets (journalctl or in-UI logs)
  • Steps to reproduce
  • FAQ — common conceptual questions
  • Hardening — preventive measures
  • Upgrade Guide — if the issue started after an upgrade