Alerts
dockmesh evaluates alert rules against the metrics pipeline every 30 seconds. When a rule fires, notifications go out via the configured channels.
Rule anatomy
Section titled “Rule anatomy”A rule has:
| Field | Real values |
|---|---|
| Name | Free text |
| Metric | cpu_percent or mem_percent (the only two metrics the engine currently evaluates) |
| Container filter | A glob against the container name. * matches every container, paperless-* matches just the paperless stack, etc. |
| Operator | gt (also accepts >/>=) or lt (also </<=). Equality and delta-style operators are not implemented. |
| Threshold | The percentage the metric is compared against |
| Duration (sec) | How long the threshold must hold before the alert fires |
| Severity | info, warning, or critical |
| Cooldown (sec) | Suppress re-notify on the same rule for this long after a fire |
| Channels | List of notification-channel IDs that receive the fire |
Creating a rule
Section titled “Creating a rule”Alerts → Rules → New rule walks through the fields. The form previews the chosen container filter against the current container list so you can sanity-check the glob.
Example: “Alert if any container in paperless-* runs above 90% memory for 5 minutes, notify Slack + Email, cooldown 30 minutes, severity warning.”
Stack-level / host-level / host-tag-level scoping isn’t supported as a first-class concept — express those by naming convention in the container_filter glob (paperless-*, media-*) or by leaving it * and relying on the per-channel routing on each rule.
Severity levels
Section titled “Severity levels”Each severity has its own icon, color, and default cooldown:
| Level | Color | Default cooldown |
|---|---|---|
| Info | Blue | 4h |
| Warning | Amber | 30m |
| Critical | Red | 5m |
Channels can be filtered by severity — e.g. Slack gets all, PagerDuty only critical.
Notification channels
Section titled “Notification channels”Built-in channels (Alerts → Channels):
- Email — SMTP host + credentials, supports STARTTLS
- Slack — incoming-webhook URL
- Discord — webhook URL
- Microsoft Teams — Incoming Webhook connector URL
- ntfy — topic URL, optional auth
- Gotify — server URL + app token
- Generic webhook — POST JSON to any URL
- PagerDuty — Events API v2 integration key. Dedup-key is derived from rule+container so repeated fires fold into one PD incident.
- Pushover — app_token + user_key, optional device + sound. Critical alerts map to priority 1 (visual alert); emergency priority 2 is intentionally not exposed (would need ack-handling the UI doesn’t have yet).
See individual integration guides for per-channel setup. Telegram is not built-in — use the generic webhook against the Telegram Bot API if you need it.
Channels don’t filter by severity on their own — every rule lists which channel IDs it notifies, so “Slack gets everything, PagerDuty gets only critical” is expressed as: low-severity rules list Slack only, critical rules list both Slack and PagerDuty.
Cooldown
Section titled “Cooldown”Without cooldown, a container that keeps crashing would spam alerts every 30 seconds. Cooldown suppresses duplicate alerts on the same resource for the configured window. When the underlying state clears and re-fires, you get a new alert.
Mute and disable
Section titled “Mute and disable”Each rule has two independent off-switches:
enabled = false— the engine doesn’t evaluate the rule at all. Use this for permanent off.muted_until = <timestamp>— the engine evaluates the rule but suppresses notifications until that timestamp passes. Set to a few hours ahead during planned maintenance.
There’s no filter-based mute page that silences across many rules at once. To temporarily quiet a noisy area during maintenance, set muted_until on the rules involved or disable them and re-enable after.
Alert history
Section titled “Alert history”Alerts → History shows every fire, sorted by occurred_at descending:
- Occurred at
- Rule name
- Severity
- Container that tripped the threshold
- Status (
fired/resolved) - Value vs. threshold
- Message
Export to a downloadable format isn’t wired up yet — for post-mortems, query the alert_history table directly or use the REST endpoint GET /api/v1/alerts/history?limit=N.
Built-in rules
Section titled “Built-in rules”dockmesh ships with four container-level defaults on first install so every new deployment has coverage from day one:
| Rule | Metric | Threshold | Duration | Severity |
|---|---|---|---|---|
| Container CPU > 90% (sustained) | cpu_percent | gt 90 | 5 min | warning |
| Container CPU > 95% (critical) | cpu_percent | gt 95 | 15 min | critical |
| Container memory > 90% | mem_percent | gt 90 | 5 min | warning |
| Container memory > 98% (near-OOM) | mem_percent | gt 98 | 60s | critical |
Built-in rules are flagged with a “built-in” badge in the Alerts table. They can be edited (change threshold, duration, mute, attach channels) and disabled, but not deleted — disabling them is the supported way to opt out. Deletion returns 409 Conflict from the API.
Host-level rules (disk, agent-offline, backup-job-failed) need per-host metrics that aren’t emitted yet — they’ll ship with follow-up slices that add the collectors.
See also
Section titled “See also”- Integrations · Slack — webhook setup
- Integrations · Discord — webhook setup
- Integrations · Telegram — bot setup
- Integrations · ntfy.sh — self-hosted push