Skip to content

Alerts

dockmesh evaluates alert rules against the metrics pipeline every 30 seconds. When a rule fires, notifications go out via the configured channels.

A rule has:

FieldReal values
NameFree text
Metriccpu_percent or mem_percent (the only two metrics the engine currently evaluates)
Container filterA glob against the container name. * matches every container, paperless-* matches just the paperless stack, etc.
Operatorgt (also accepts >/>=) or lt (also </<=). Equality and delta-style operators are not implemented.
ThresholdThe percentage the metric is compared against
Duration (sec)How long the threshold must hold before the alert fires
Severityinfo, warning, or critical
Cooldown (sec)Suppress re-notify on the same rule for this long after a fire
ChannelsList of notification-channel IDs that receive the fire

Alerts → Rules → New rule walks through the fields. The form previews the chosen container filter against the current container list so you can sanity-check the glob.

Example: “Alert if any container in paperless-* runs above 90% memory for 5 minutes, notify Slack + Email, cooldown 30 minutes, severity warning.”

Stack-level / host-level / host-tag-level scoping isn’t supported as a first-class concept — express those by naming convention in the container_filter glob (paperless-*, media-*) or by leaving it * and relying on the per-channel routing on each rule.

Each severity has its own icon, color, and default cooldown:

LevelColorDefault cooldown
InfoBlue4h
WarningAmber30m
CriticalRed5m

Channels can be filtered by severity — e.g. Slack gets all, PagerDuty only critical.

Built-in channels (Alerts → Channels):

  • Email — SMTP host + credentials, supports STARTTLS
  • Slack — incoming-webhook URL
  • Discord — webhook URL
  • Microsoft Teams — Incoming Webhook connector URL
  • ntfy — topic URL, optional auth
  • Gotify — server URL + app token
  • Generic webhook — POST JSON to any URL
  • PagerDuty — Events API v2 integration key. Dedup-key is derived from rule+container so repeated fires fold into one PD incident.
  • Pushover — app_token + user_key, optional device + sound. Critical alerts map to priority 1 (visual alert); emergency priority 2 is intentionally not exposed (would need ack-handling the UI doesn’t have yet).

See individual integration guides for per-channel setup. Telegram is not built-in — use the generic webhook against the Telegram Bot API if you need it.

Channels don’t filter by severity on their own — every rule lists which channel IDs it notifies, so “Slack gets everything, PagerDuty gets only critical” is expressed as: low-severity rules list Slack only, critical rules list both Slack and PagerDuty.

Without cooldown, a container that keeps crashing would spam alerts every 30 seconds. Cooldown suppresses duplicate alerts on the same resource for the configured window. When the underlying state clears and re-fires, you get a new alert.

Each rule has two independent off-switches:

  • enabled = false — the engine doesn’t evaluate the rule at all. Use this for permanent off.
  • muted_until = <timestamp> — the engine evaluates the rule but suppresses notifications until that timestamp passes. Set to a few hours ahead during planned maintenance.

There’s no filter-based mute page that silences across many rules at once. To temporarily quiet a noisy area during maintenance, set muted_until on the rules involved or disable them and re-enable after.

Alerts → History shows every fire, sorted by occurred_at descending:

  • Occurred at
  • Rule name
  • Severity
  • Container that tripped the threshold
  • Status (fired / resolved)
  • Value vs. threshold
  • Message

Export to a downloadable format isn’t wired up yet — for post-mortems, query the alert_history table directly or use the REST endpoint GET /api/v1/alerts/history?limit=N.

dockmesh ships with four container-level defaults on first install so every new deployment has coverage from day one:

RuleMetricThresholdDurationSeverity
Container CPU > 90% (sustained)cpu_percentgt 905 minwarning
Container CPU > 95% (critical)cpu_percentgt 9515 mincritical
Container memory > 90%mem_percentgt 905 minwarning
Container memory > 98% (near-OOM)mem_percentgt 9860scritical

Built-in rules are flagged with a “built-in” badge in the Alerts table. They can be edited (change threshold, duration, mute, attach channels) and disabled, but not deleted — disabling them is the supported way to opt out. Deletion returns 409 Conflict from the API.

Host-level rules (disk, agent-offline, backup-job-failed) need per-host metrics that aren’t emitted yet — they’ll ship with follow-up slices that add the collectors.