Alerts

dockmesh evaluates alert rules against the metrics pipeline every 30 seconds. When a rule fires, notifications go out via the configured channels.

Rule anatomy

A rule has:

Field	Real values
Name	Free text
Metric	`cpu_percent` or `mem_percent` (the only two metrics the engine currently evaluates)
Container filter	A glob against the container name. `` matches every container, `paperless-` matches just the paperless stack, etc.
Operator	`gt` (also accepts `>`/`>=`) or `lt` (also `<`/`<=`). Equality and delta-style operators are not implemented.
Threshold	The percentage the metric is compared against
Duration (sec)	How long the threshold must hold before the alert fires
Severity	`info`, `warning`, or `critical`
Cooldown (sec)	Suppress re-notify on the same rule for this long after a fire
Channels	List of notification-channel IDs that receive the fire

Creating a rule

Alerts → Rules → New rule walks through the fields. The form previews the chosen container filter against the current container list so you can sanity-check the glob.

Example: “Alert if any container in paperless-* runs above 90% memory for 5 minutes, notify Slack + Email, cooldown 30 minutes, severity warning.”

Stack-level / host-level / host-tag-level scoping isn’t supported as a first-class concept — express those by naming convention in the container_filter glob (paperless-*, media-*) or by leaving it * and relying on the per-channel routing on each rule.

Severity levels

Each severity has its own icon, color, and default cooldown:

Level	Color	Default cooldown
Info	Blue	4h
Warning	Amber	30m
Critical	Red	5m

Channels can be filtered by severity — e.g. Slack gets all, PagerDuty only critical.

Notification channels

Built-in channels (Alerts → Channels):

Email — SMTP host + credentials, supports STARTTLS
Slack — incoming-webhook URL
Discord — webhook URL
Microsoft Teams — Incoming Webhook connector URL
ntfy — topic URL, optional auth
Gotify — server URL + app token
Generic webhook — POST JSON to any URL
PagerDuty — Events API v2 integration key. Dedup-key is derived from rule+container so repeated fires fold into one PD incident.
Pushover — app_token + user_key, optional device + sound. Critical alerts map to priority 1 (visual alert); emergency priority 2 is intentionally not exposed (would need ack-handling the UI doesn’t have yet).

See individual integration guides for per-channel setup. Telegram is not built-in — use the generic webhook against the Telegram Bot API if you need it.

Channels don’t filter by severity on their own — every rule lists which channel IDs it notifies, so “Slack gets everything, PagerDuty gets only critical” is expressed as: low-severity rules list Slack only, critical rules list both Slack and PagerDuty.

Cooldown

Without cooldown, a container that keeps crashing would spam alerts every 30 seconds. Cooldown suppresses duplicate alerts on the same resource for the configured window. When the underlying state clears and re-fires, you get a new alert.

Mute and disable

Each rule has two independent off-switches:

enabled = false — the engine doesn’t evaluate the rule at all. Use this for permanent off.
muted_until = <timestamp> — the engine evaluates the rule but suppresses notifications until that timestamp passes. Set to a few hours ahead during planned maintenance.

There’s no filter-based mute page that silences across many rules at once. To temporarily quiet a noisy area during maintenance, set muted_until on the rules involved or disable them and re-enable after.

Alert history

Alerts → History shows every fire, sorted by occurred_at descending:

Occurred at
Rule name
Severity
Container that tripped the threshold
Status (fired / resolved)
Value vs. threshold
Message

Export to a downloadable format isn’t wired up yet — for post-mortems, query the alert_history table directly or use the REST endpoint GET /api/v1/alerts/history?limit=N.

Built-in rules

dockmesh ships with four container-level defaults on first install so every new deployment has coverage from day one:

Rule	Metric	Threshold	Duration	Severity
Container CPU > 90% (sustained)	cpu_percent	gt 90	5 min	warning
Container CPU > 95% (critical)	cpu_percent	gt 95	15 min	critical
Container memory > 90%	mem_percent	gt 90	5 min	warning
Container memory > 98% (near-OOM)	mem_percent	gt 98	60s	critical

Built-in rules are flagged with a “built-in” badge in the Alerts table. They can be edited (change threshold, duration, mute, attach channels) and disabled, but not deleted — disabling them is the supported way to opt out. Deletion returns 409 Conflict from the API.

Host-level rules (disk, agent-offline, backup-job-failed) need per-host metrics that aren’t emitted yet — they’ll ship with follow-up slices that add the collectors.