Skip to content

Monitoring Stack (Prometheus + Grafana + Loki)

A complete observability stack:

  • Prometheus — metrics database
  • Grafana — dashboards
  • Loki — log aggregation
  • Promtail — log shipper
  • cAdvisor + node-exporter — container + host metrics

All managed by dockmesh, scraping metrics from dockmesh itself and your other stacks.

When done:

  • grafana.example.com — dashboards for CPU, memory, disk, network, per-container + per-host
  • Centralized logs searchable with LogQL (Grafana → Explore → Loki)
  • Alerts on metric thresholds (can flow into dockmesh Alerts too)

Stacks → New stack → name monitoring:

services:
prometheus:
image: prom/prometheus:latest
restart: unless-stopped
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention.time=30d
- --storage.tsdb.path=/prometheus
- --web.enable-lifecycle
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prom_data:/prometheus
node-exporter:
image: prom/node-exporter:latest
restart: unless-stopped
command:
- --path.rootfs=/host
pid: host
volumes:
- /:/host:ro,rslave
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
restart: unless-stopped
privileged: true
devices:
- /dev/kmsg:/dev/kmsg
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
loki:
image: grafana/loki:latest
restart: unless-stopped
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki_data:/loki
promtail:
image: grafana/promtail:latest
restart: unless-stopped
depends_on: [loki]
command: -config.file=/etc/promtail/promtail-config.yml
volumes:
- ./promtail-config.yml:/etc/promtail/promtail-config.yml:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
grafana:
image: grafana/grafana-oss:latest
restart: unless-stopped
depends_on: [prometheus, loki]
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
GF_SERVER_ROOT_URL: https://grafana.example.com
GF_INSTALL_PLUGINS: grafana-piechart-panel
volumes:
- grafana_data:/var/lib/grafana
volumes:
prom_data:
loki_data:
grafana_data:

Create /opt/dockmesh/stacks/local/monitoring/prometheus.yml:

global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]
- job_name: node-exporter
static_configs:
- targets: ["node-exporter:9100"]
- job_name: cadvisor
static_configs:
- targets: ["cadvisor:8080"]
- job_name: dockmesh
static_configs:
- targets: ["host.docker.internal:8080"]
metrics_path: /metrics
# dockmesh exposes Prometheus metrics on /metrics by default

And /opt/dockmesh/stacks/local/monitoring/promtail-config.yml:

server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 30s
relabel_configs:
- source_labels: ["__meta_docker_container_name"]
target_label: container
- source_labels: ["__meta_docker_container_label_com_docker_compose_project"]
target_label: stack

Deploy the stack. Add proxy route:

  • Domain: grafana.example.com
  • Target: monitoring_grafana_1
  • Port: 3000
  • TLS: Automatic

Log in with admin + the GRAFANA_PASSWORD env var.

In Grafana:

Configuration → Data sources → Add:

  1. Prometheus: URL http://prometheus:9090
  2. Loki: URL http://loki:3100

Both on the internal Compose network.

Grafana community dashboards (import by ID in Grafana → Dashboards → Import):

  • 1860 — Node Exporter Full (host metrics)
  • 14282 — cAdvisor (container metrics)
  • 13946 — Docker Logs (via Loki)
  • 15141 — Docker Swarm + Containers (works for Compose too)

Good starting set — customize from there.

Prometheus can fire alerts to dockmesh via webhook:

  1. In dockmesh: Settings → Channels → New webhook channel → note the URL
  2. In Prometheus: add alertmanager config pointing to dockmesh’s webhook URL
  3. Alerts from Prometheus now flow through dockmesh’s notification channels (Slack, Discord, etc.)

Or run alerts native in Grafana and wire those to dockmesh the same way.

  • Prometheus with 30d retention, 100 targets, 30s scrape: ~5 GB
  • Loki with Docker logs from 50 containers: ~10 GB/month (compressed)
  • Grafana: <100 MB unless you load many dashboards

Add a backup job for the prom_data, loki_data, and grafana_data volumes — Grafana dashboards are what hurts most to lose.