An OpenNMS Horizon plugin that pushes performance data to any Prometheus-compatible Remote Write endpoint — Prometheus, Cortex, Grafana Mimir, VictoriaMetrics, Thanos Receive — and surfaces OpenNMS resource context as native Prometheus labels.
|
Incubation project — community support only
This is an incubation project of the OpenNMS Community. Support is available at opennms.discourse.group. No commercial support is available yet. |
Overview
What this plugin does
prometheus-remote-writer implements the OpenNMS TimeSeriesStorage SPI from
opennms-integration-api v2.0. When OpenNMS Horizon is configured with
org.opennms.timeseries.strategy = integration and this plugin is the only
TSS implementation registered, every collected sample flows through the
plugin to a Prometheus-compatible Remote Write endpoint of your choice.
The plugin pushes OpenNMS resource context — node identity, foreign-source qualification, surveillance categories, interface descriptors, optional metadata — to the backend as native Prometheus labels. Operators query the resulting time series with PromQL directly, from Grafana’s native Prometheus data source. No OpenNMS-side query plugin and no round-trip to the OpenNMS REST API are required at query time.
What this plugin does NOT
|
This plugin runs on OpenNMS Horizon Core only. It is not yet supported on OpenNMS Sentinel. |
The KAR will install cleanly into /opt/sentinel/deploy/ and the
plugin’s TimeSeriesStorage OSGi service will register, but no
samples will reach it: OpenNMS upstream has no Core → Sentinel
sample-dispatch path for OIA TSS plugins, and the Sentinel-side
streaming-telemetry adapter pipeline (architecturally compatible
in principle) has not been verified end-to-end with this plugin.
If you need to offload sample persistence to a Sentinel container today, this is not the right plugin yet. Install it on Horizon Core and use the rest of this guide.
Where it fits
store(samples)
│
▼
┌──────────────────────┐
OpenNMS ────▶│ TimeSeriesStorage │
collectors │ (this plugin) │
└──────────────────────┘
│
▼ Remote Write v1 / v2
┌──────────────────────┐
│ Prometheus / Mimir │
│ VictoriaMetrics / │
│ Cortex / Thanos │
└──────────────────────┘
▲
│ PromQL via Prom HTTP API
┌──────────────────────┐
│ Grafana (native │
│ Prometheus DS) │
└──────────────────────┘
Quick comparison vs. the legacy Cortex TSS plugin
OpenNMS ships a Prometheus integration today via
opennms-cortex-tss-plugin.
That plugin writes numeric samples to a Prometheus backend but keeps
OpenNMS resource context (node label, foreign source, categories, asset
record, interface descriptors) in a separate OpenNMS key-value store. To
turn opaque resourceId labels back into human-readable resources, you
need the
OpenNMS Plugin for Grafana,
which round-trips to the OpenNMS REST API at query time.
This plugin fixes that at the write path. Resource context is pushed to the backend as first-class Prometheus labels, so PromQL — on any vanilla Grafana Prometheus data source — works end-to-end with no OpenNMS query-time dependency.
Supported backends at a glance
| Backend | Remote Write v1 | Remote Write v2 | Notes |
|---|---|---|---|
Prometheus 3.x |
✅ |
✅ |
v2 receiver default-enabled |
Prometheus 2.55+ |
✅ |
✅ |
Receiver must be enabled with |
Prometheus 2.50–2.54 |
✅ |
⚠ silently drops |
Stay on v1 — see Wire protocols (v1 and v2) |
Grafana Mimir 2.10+ |
✅ |
✅ |
|
VictoriaMetrics |
✅ |
✅ (with v2 ingest) |
|
Cortex |
✅ |
✅ |
|
Thanos Receive |
✅ |
✅ |
|
Grafana Cloud |
✅ |
✅ |
Reference specifications
The plugin is a clean-room implementation, written from public specifications:
-
Prometheus protobuf definitions (
prompb/remote.proto,prompb/types.proto, Apache 2.0 upstream) -
Sanitization rules from
prometheus/common(Apache 2.0 upstream)
The normative requirement set lives in the project’s tss-plugin spec
under openspec/specs/tss-plugin/spec.md.
Installation
Compatibility
OpenNMS Horizon |
35+ |
JVM |
Temurin / OpenJDK 17 (matches the Horizon container) |
Apache Karaf |
4.4.10 |
Integration API |
|
The plugin’s OSGi bundle is compiled to Java 17 bytecode to match Horizon’s runtime; running on an older JVM fails feature resolution at install time.
Install on OpenNMS Horizon Core
The plugin ships as a Karaf KAR (prometheus-remote-writer-kar-X.Y.Z.kar).
Drop it into Karaf’s deploy directory and Karaf hot-installs it:
# Download the KAR from the GitHub Release matching your installed version.
# Example for v0.3.0:
curl -L -o /opt/opennms/deploy/prometheus-remote-writer.kar \
{project-repo-url}/releases/download/v{revnumber}/{project-artifact}-kar-{revnumber}.kar
Confirm from the Karaf shell:
ssh -p 8101 admin@localhost
karaf@root()> bundle:list -s | grep prometheus-remote-writer
Activate the plugin as the active TSS
Set the time-series strategy in etc/opennms.properties.d/timeseries.properties:
org.opennms.timeseries.strategy = integration
Restart Core for the strategy switch to take effect. The next collector flush sends samples through the plugin.
|
Drop in the minimal metatag config before you restart
By default, OpenNMS does not attach node, foreign-source, location, or
interface tags to samples — they arrive at the plugin carrying only
|
Minimum configuration
Drop the bare minimum at etc/org.opennms.plugins.tss.prometheus-remote-writer.cfg:
write.url = https://mimir.example.com/api/v1/push
read.url = https://mimir.example.com/prometheus
read.url is the backend’s Prometheus-compatible root. The plugin
appends /api/v1/series and /api/v1/query_range itself — do not
include /api/v1 in the configured value. See
Configuration reference for the full list of knobs and
backend-specific URL shapes.
Verify
From the Karaf shell:
karaf@root()> bundle:list -s | grep prometheus-remote-writer
karaf@root()> opennms:prometheus-writer-stats
opennms:prometheus-writer-stats prints all plugin counters and gauges.
Watch samples_written_total tick up as OpenNMS pushes its first samples
through the plugin.
|
Default labels need OpenNMS metatag config to be useful
By default, samples arriving at the plugin carry only |
Local sandbox (e2e)
The repository’s e2e/ directory contains a Docker Compose stack that
stands up an OpenNMS Horizon core, Grafana, and one Prometheus-compatible
backend of your choice. It is the quickest way to see the plugin work
end-to-end on a laptop.
Prerequisites: Docker 24+ with Compose v2.
The fastest way to prove the plugin works is make smoke:
make smoke # all backends, sequential
make smoke BACKENDS=prometheus # single backend
make smoke TIMEOUT=900 BACKENDS=mimir # bump the per-backend deadline
For interactive exploration, see e2e/README.md in the source tree.
Configuration reference
All knobs live in etc/org.opennms.plugins.tss.prometheus-remote-writer.cfg (Karaf ConfigAdmin PID org.opennms.plugins.tss.prometheus-remote-writer) and
take effect on the next flush cycle — no OpenNMS restart required, except
where noted (typically wire-format and validator changes that require
bundle restart).
Endpoint and authentication
| Key | Default | Purpose |
|---|---|---|
|
(required) |
Remote Write v1/v2 ingest URL. |
|
(required for read path) |
Prometheus-compatible HTTP API root. The plugin appends |
|
(unset) |
Stamps every sample with |
|
(unset) |
Override the per-sample |
|
(unset) |
Sets the |
|
(unset) |
Basic auth — username. |
|
(unset) |
Basic auth — password. |
|
(unset) |
Bearer auth. Mutually exclusive with Basic. |
|
(unset) |
PEM bundle to trust in addition to / in place of the JDK truststore. |
|
|
Disables hostname and certificate verification. Logs WARN on startup and every hour. |
The plugin refuses to start if both auth.basic.* and
auth.bearer.token are configured.
|
Wire format
| Key | Default | Purpose |
|---|---|---|
|
|
|
Write pipeline (in-memory)
Used when wal.enabled=false (the default). When the WAL is enabled,
queue.capacity is ignored.
| Key | Default | Purpose |
|---|---|---|
|
|
Bounded in-memory queue. Overflow throws |
|
|
Maximum samples per Remote Write POST. |
|
|
Flush whenever batch fills OR this interval elapses. |
|
|
5xx retry budget per batch. |
|
|
First backoff after a 5xx. |
|
|
Backoff ceiling. |
|
|
Bounds the graceful-shutdown wait. With WAL enabled, bounds the in-flight HTTP wait, not a drain window. |
HTTP client
| Key | Default | Purpose |
|---|---|---|
|
|
Socket connect timeout. |
|
|
Response read timeout. |
|
|
Request write timeout. |
|
|
Per-route OkHttp connection pool size. |
Read path
| Key | Default | Purpose |
|---|---|---|
|
|
|
Label policy
| Key | Default | Purpose |
|---|---|---|
|
(empty) |
Glob list of source-tag keys to surface as labels in addition to the default allowlist. Snake-cased on the wire. See Labels and enrichment. |
|
(empty) |
Default labels to drop. Comma-separated label names. |
|
(empty) |
Rename a label. Comma-separated |
|
(empty) |
Add a second name for a label. Comma-separated |
|
(empty) |
Prefix added to every metric name on the wire (sanitized). |
Pipeline order is defaults → exclude → include → copy → rename →
metadata. See Labels and enrichment for the full mental model and worked
recipes.
|
Metadata passthrough
OpenNMS metadata is opt-in only. The built-in denylist always applies.
| Key | Default | Purpose |
|---|---|---|
|
|
Master switch. |
|
(empty) |
Glob list of |
|
(empty) |
Glob list to subtract from |
|
|
Prefix applied to emitted metadata labels. |
|
|
|
Metadata is an open KV store; operators put credentials in there
(requisition:snmp-community, jdbc:password, API tokens). The built-in
denylist (password, secret, token, key, snmp-community)
is always applied — even when metadata.include would match a denied key.
Leave metadata disabled unless you have an explicit use case.
|
Write-Ahead Log (durable buffering)
Opt-in via wal.enabled=true. See Write-Ahead Log for the full operator model.
| Key | Default | Purpose |
|---|---|---|
|
|
Opt-in. |
|
(empty → |
Set explicitly to redirect to a mounted volume. |
|
|
Total disk-footprint cap. |
|
|
Per-segment rotation threshold. |
|
|
|
|
|
|
Worked example — fully configured
# Endpoint
write.url = https://mimir.example.com/api/v1/push
read.url = https://mimir.example.com/prometheus
# Source identity
instance.id = opennms-us-east
# job.name unset — job is derived per-sample from resourceId shape
# Authentication
auth.bearer.token = ${env:MIMIR_TOKEN}
tenant.org-id = fleet-prod
# Wire format
wire.protocol-version = 2
# Write pipeline (in-memory, used because wal.enabled=false)
queue.capacity = 50000
batch.size = 2000
flush.interval-ms = 1000
# HTTP
http.max-connections = 32
# Label policy
labels.include = sysDescription, assetRegion
labels.copy = foreign_source -> tenant
metadata.enabled = false
The full set of normative scenarios lives in
openspec/specs/tss-plugin/spec.md in the source tree — that file is the
authoritative source of truth for parser behavior, defaults, and
edge cases.
Wire protocols (v1 and v2)
The plugin supports both Prometheus Remote Write protocol versions.
Operator-selectable per deployment via wire.protocol-version.
Selection
| Value | Wire format | Headers | Backend requirement |
|---|---|---|---|
|
Snappy-compressed |
|
Any backend that accepts Remote Write v1 — Prometheus, Mimir, VictoriaMetrics, Cortex, Thanos Receive. |
|
Snappy-compressed |
|
Prometheus 3.0+ recommended (default-enabled receiver). Earlier
versions: 2.55+ stable but receiver must be enabled with
|
When to flip to v2
-
Forward capacity. Native histograms, exemplars, per-series metadata, and created-timestamps are first-class in the v2 schema. The plugin does not populate them today (OpenNMS doesn’t produce them), but enabling v2 unblocks future features without another wire-format pivot.
-
Wire bandwidth. v2’s string interning eliminates per-sample repetition of label names and values. For typical OpenNMS batches (every series carries the same dozen-or-so default labels:
name,node,job,instance, …), this is a real pre-snappy reduction; the magnitude depends on batch size and label-sharing. After snappy the savings are smaller — measure your own deployment before flipping for bandwidth reasons alone.
When to leave it on v1
-
Backend compatibility is uncertain or includes older Prometheus / Mimir / VictoriaMetrics versions.
-
Wire bandwidth isn’t a constraint for your deployment volume.
Operational notes
|
Prometheus 2.50–2.54 silently drop v2 payloads
The receiver was experimental in that range — payloads can return 2xx (or
204) yet the samples never appear via Pin Prometheus 3.0+ for v2. The receiver is default-enabled and stable
in 3.0+. 2.55+ works if you enable |
|
No auto-fallback
If |
|
WAL is wire-version-agnostic
The on-disk WAL stores |
|
No effect on the read path
The plugin’s read side ( |
What v2 does NOT add (yet)
-
Native histograms — OpenNMS doesn’t produce them.
-
Exemplars — no trace-ID source on the OpenNMS side.
-
Per-series metadata — no
help/unitsource today. -
Created-timestamp counter-reset hint.
The v2 wire layer in this release leaves these fields empty. A future change can populate any of them without breaking the wire layer.
Write-Ahead Log
Opt-in via wal.enabled=true. When enabled, every store() sample is
appended to an on-disk Write-Ahead Log before the call returns, and the
WAL replaces the in-memory queue.capacity buffer as source of truth.
What the WAL gives you
-
Restart preservation. Samples queued before a graceful shutdown replay to the endpoint on the next process start. Under the default
batchfsync, akill -9may lose the last fsync window’s worth of samples; everything before that is durable. -
Extended outage buffering. 5xx and transport failures never advance the WAL checkpoint. The plugin retries from the same offset on every flush cycle for as long as the endpoint stays down, up to
wal.max-size-bytestotal footprint.
Configuration knobs (recap)
| Key | Default | Purpose |
|---|---|---|
|
|
Opt-in. When |
|
|
Empty resolves to |
|
|
Total disk-footprint cap. Overflow policy fires when reached. |
|
|
Per-segment rotation threshold. Must be ≤ |
|
|
|
|
|
|
|
Knobs that change meaning when
wal.enabled=true
|
How it works, briefly
store(samples)
│
▼
LabelMapper.map()
│
▼
WAL.append(MappedSample) ◀── durable on disk
│
▼
WalFlusher.pollBatch() ──▶ HTTP POST ──2xx──▶ Checkpoint.advance(offset)
│ │
│ 4xx: advance checkpoint (matches pre-WAL drop) ▼
│ segments past the
│ 5xx exhausted / transport: leave checkpoint; checkpoint become
│ re-read same batch next cycle eligible for deletion
Segment files are named by start offset
(00000000000000000000.seg, 00000000000067108864.seg, …). Each
segment has a companion .idx jsonl summary. checkpoint.json at the
WAL root tracks the last offset confirmed shipped; written atomically
(tmp + fsync + rename) on every advance.
Recovery on startup
When wal.enabled=true, the plugin scans wal.path at startup and
replays any samples whose offset is greater than or equal to
checkpoint.json.last_sent_offset. Recovery tolerates torn tails — an
incomplete frame at the end of the most recent segment, typical of a
process killed mid-append — by logging a WARN, truncating the segment to
the last good frame, and resuming.
If any recovery step fails (unwritable path, corrupt checkpoint.json,
unreadable segment), the plugin refuses to start with an actionable
error message naming the file and the failure reason. Operators reset
by removing the WAL directory while the plugin is stopped:
# while the plugin is stopped
rm -rf ${wal.path}
The plugin recreates the directory on next start.
Wire-version interaction
The WAL stores MappedSample (pre-wire). Flipping wire.protocol-version
while the WAL holds pending samples is safe — the next flush emits in the
new version with no WAL migration step. See Wire protocols (v1 and v2).
Operational guidance
| Choice | When |
|---|---|
|
You can justify the ~10× throughput hit for tighter-than-1s RPO. Rare. |
|
The right choice for almost everyone. Loses at most one
|
|
Ephemeral deployments that accept losing in-flight data on kernel panic / power loss. |
|
Alerting-driven pipelines that want operators to see the failure when the backend is unreachable longer than the WAL cap allows. |
|
Dashboards showing "the last N hours" — recency over history;
tolerates silent eviction of the oldest samples better than a
|
|
Containerised Karaf
If |
When to leave the WAL off
-
Single-node OpenNMS with a highly-available local backend on the same box: disk I/O for samples that will definitely deliver is overhead.
-
Short uptime / test environments where restart preservation is not a requirement.
The default stays wal.enabled=false. A later release may flip the
default once the feature has soaked in real deployments.
Self-metrics
The WAL adds these counters and gauges (visible via
opennms:prometheus-writer-stats):
-
wal_replay_samples_total -
samples_dropped_wal_full_total -
wal_batches_dropped_4xx_total -
wal_segments(gauge) -
wal_disk_bytes(gauge)
The general write counters (samples_written_total,
samples_dropped_4xx_total, samples_dropped_5xx_total,
samples_dropped_transport_total) are unchanged and continue to apply.
Labels and enrichment
Default label set
For every sample, the plugin emits the following Prometheus labels when the corresponding source data is available:
| Label | Source | Notes |
|---|---|---|
|
config |
Only emitted when |
|
derived from |
|
|
intrinsic |
Sanitized to Prom’s metric-name grammar. |
|
intrinsic |
Raw, lossless. |
|
derived |
|
|
same value as |
Prom-idiomatic subject-identity label for mixed-backend filtering. Emitted iff |
|
external |
Human-readable. Mutable — disable via config if churn is a concern. |
|
external |
Stable. |
|
external |
Stable. |
|
external |
OpenNMS monitoring location. |
|
parsed from |
e.g. |
|
parsed from |
e.g. |
|
external |
|
|
external |
|
|
derived |
Bits-per-second: |
|
surveillance categories |
One label per category, value |
|
meta tag |
Metric type ( |
|
plain-key Sample meta tags |
MATE-scope tags ( |
|
plain-key Sample external tags |
Resource string attributes attached by collectors (JMX bean Name properties, JDBC |
Deliberately excluded by default — available via labels.include if
you want them: if_alias (user-editable, churns), sys_descr,
sys_object_id, asset-record fields, OpenNMS metadata (see
Configuration reference for the metadata gating rules).
mtype round-trip and the read-time fallback
OpenNMS’s read-side graph renderer (NewtsConverterUtils.dataPointToRow)
unconditionally dereferences MetaTagNames.mtype on every Metric the
plugin returns. A Metric reaching that code without an mtype meta tag
trips a NullPointerException and the graph fetch returns HTTP 500.
To keep the round-trip working, the plugin:
-
Emits
mtypeas a default label on write — sourced from the Sample’sMetaTagNames.mtypemeta tag (the OpenNMS writer sets it on every Sample). Reserved againstlabels.renamecollisions like the rest of the default allowlist. -
Synthesizes
mtype="gauge"on read when a Prometheus response for a series lacks the label. This covers data already on disk from before the fix landed. Counter metrics in legacy data render as cumulative values rather than rates — visibly less informative but never wrong; new writes preserve the original mtype, so post-fix counter rendering is correct.
Operators who explicitly exclude mtype via labels.exclude = mtype will
break graph rendering for new writes — the synthesis fallback still
recovers those reads, but counter graphs degrade to gauges. The exclude
path is intended only for non-OpenNMS consumers of the same Prometheus
stack.
The samples_synthesized_mtype_total counter (visible via
opennms:prometheus-writer-stats) tracks every fallback synthesis. The
counter ticks once per Metric reconstruction — per matched series in
findMetrics, once per fetch in getTimeSeriesData — not once per
Sample. Watch it climb until Prometheus retention has aged out the
pre-fix data, then drop to flat: at that point every rendered graph is
using authentic mtype values from the writer.
If the counter rises indefinitely instead of plateauing, the most likely
cause is labels.exclude = mtype in your config — that’s a supported
operator override (intended for non-OpenNMS consumers of the same
Prometheus stack), but it means the read path falls back to synthesis on
every fetch indefinitely. Either remove the exclude rule or treat the
rising counter as expected for your deployment.
Resource string attributes (onms_attr_* and onms_extattr_*)
OpenNMS resource-graph templates substitute shell-style placeholders
like ${name}, ${datname}, and ${spcname} against string
attributes attached to a resource. On the integration-API write path,
those attributes arrive on the Metric partition system: meta tags
carry MATE-scope values, external tags carry collector-emitted resource
properties. The motivating case for the round-trip is that the
resource-string-attribute named name (the Eventd Processing Stats
row, the JDBC datasource label, etc.) collides with the intrinsic
name (metric-name) tag the plugin emits as name — and OpenNMS-
core’s TimeseriesResourceStorageDao.getStringAttributes() reads only
from Metric.getExternalTags() for placeholder substitution, so
partition fidelity is required end-to-end.
The plugin makes the round-trip work via two reserved label prefixes, one per partition:
-
onms_attr_<key>carries the META partition (MATE-scope tags,mtypeaside). Read side strips and deposits onMetric.getMetaTags(). -
onms_extattr_<key>carries the EXTERNAL partition (collector- emitted resource string attributes — the values placeholder substitution actually reads). Read side strips and deposits onMetric.getExternalTags().
Concretely:
-
Write — for each non-intrinsic partition, every Sample tag whose key is non-empty, contains no
:(context tags useonms_meta_instead — they’re owned by the metadata processor regardless of partition), is notmtype, is not blocked by the built-in plain-key secret denylist (password,secret,token,snmp-community, all case-insensitive), and is not already represented under a canonical default-label name (for the external-partition pass — the meta pass uses an empty consumed-keys set to preserve v0.4.0 behavior) is emitted as<prefix><sanitized_key>=<sanitized_value>. The walks read the partition lists directly off the sourceMetric, bypassing the intrinsic-wins shadow merge that otherwise drops collisions.The plain-key denylist is deliberately narrower than the context-tag form (
metadata.label-prefix’s denylist also includes `:*key). Resource string attributes commonly shaped likeprimary_key,partition_key, orforeign_keyare exactly the attributes that resource graphs substitute via${…}placeholders, so the plain-key path lets them through. Only credential-shaped names (password / secret / token /snmp-community) are blocked from theonms_attr_andonms_extattr_namespaces. -
Read — labels matching
onms_attr_<key>reconstruct as a meta tag with key<key>; labels matchingonms_extattr_<key>reconstruct as an external tag with key<key>. The prefixed forms are not also surfaced under their raw names — single source of truth, per partition.
|
Sanitization is one-way for non-identifier source keys
The plugin sanitizes the meta-tag key into the Prometheus label-name
grammar ( |
|
No retroactive synthesis
Unlike the |
Operators who want to opt out of either namespace altogether (cost-
sensitive backends, no resource-graph use case) can set
labels.exclude = onms_attr_* and / or labels.exclude = onms_extattr_*.
Excluding onms_extattr_* reverts resource-graph placeholder
substitution to literal placeholders; excluding onms_attr_* mostly
just drops MATE-scope label duplication. Other graphs are unaffected.
Label enrichment is two-sided
The default label allowlist is the write-side policy: it decides which OpenNMS-attached tags become Prometheus labels. The read-side — OpenNMS attaching those tags to samples in the first place — lives in OpenNMS itself, and it is off by default.
OpenNMS metatags config ─▶ MATE interpolation ─▶ sample tags ─▶ plugin label mapping ─▶ Prometheus labels (read-side, you) (OpenNMS core) (per Sample) (this plugin) (on the wire)
OpenNMS’s MetaTagDataLoader runs each configured value template through
the MATE interpolator against the sample’s scope and attaches a tag for
every non-empty result. The property names below and the MATE scope
syntax (${node:…}, ${interface:…}, ${service:…}, ${asset:…}) are
as of OpenNMS Horizon 35; upstream changes may rename properties in
future Horizon releases. If you don’t configure any
org.opennms.timeseries.tin.metatags.tag. properties, samples arrive at
this plugin carrying only name and resourceId* — and your Prometheus
series will have bare {resourceId="…"} labels regardless of what this
plugin is configured to emit.
Minimal metatag config
Put these four lines in etc/opennms.properties.d/metatags.properties
to enable node identity labels:
org.opennms.timeseries.tin.metatags.tag.nodeLabel = ${node:label}
org.opennms.timeseries.tin.metatags.tag.foreignSource = ${node:foreign-source}
org.opennms.timeseries.tin.metatags.tag.foreignId = ${node:foreign-id}
org.opennms.timeseries.tin.metatags.tag.location = ${node:location}
After OpenNMS reloads, samples carry those four tags. The plugin’s
default allowlist maps them to node_label, foreign_source,
foreign_id, and location, and derives node="<fs>:<fid>" from the
pair. For interface descriptors:
org.opennms.timeseries.tin.metatags.tag.ifName = ${interface:if-name}
org.opennms.timeseries.tin.metatags.tag.ifDescr = ${interface:if-description}
Surveillance categories
Categories are a separate opt-in on the OpenNMS side:
org.opennms.timeseries.tin.metatags.exposeCategories = true
Setting this causes OpenNMS to attach a categories sample tag
(comma-separated list of surveillance-category names). The plugin’s
default allowlist already expands that single tag into one
onms_cat_<sanitized-name> label per value — no additional plugin
config needed.
|
The node record must exist in OpenNMS
|
Identifying samples from multiple OpenNMS instances
Running more than one OpenNMS instance against the same Prometheus-compatible backend? Two independent knobs exist, and they solve different problems:
| Knob | Header / label | What it does | Works with |
|---|---|---|---|
|
label |
Stamps every sample with a stable per-instance identifier. PromQL can filter ( |
Every Prometheus-compatible backend. |
|
header |
Partitions storage at the backend tier — each tenant’s data is isolated, queried separately. |
Mimir, Cortex, VictoriaMetrics cluster, Thanos Receive. No-op against plain Prometheus and single-tenant VictoriaMetrics. |
When to use which
| Deployment | instance.id |
tenant.org-id |
|---|---|---|
Single OpenNMS → dedicated backend |
not required |
not required |
Multiple OpenNMS → shared Prometheus / single-tenant VictoriaMetrics |
required |
n/a (no-op) |
Multiple OpenNMS → Mimir / Cortex / VM cluster, fleet-wide queries |
required |
optional |
Multiple OpenNMS → Mimir / Cortex / VM cluster, strict per-instance isolation |
optional |
required |
If you want both fleet-wide PromQL and backend-enforced isolation, set both.
Example
Two OpenNMS instances writing to the same Mimir cluster:
# opennms.properties.d on instance #1
instance.id = opennms-us-east
tenant.org-id = fleet-prod
# opennms.properties.d on instance #2
instance.id = opennms-us-west
tenant.org-id = fleet-prod
PromQL:
# All nodes, either OpenNMS
up{job="opennms"}
# Per-OpenNMS rollup
sum by (onms_instance_id) (rate(ifHCInOctets[5m]))
# Just the west instance
ifHCInOctets{onms_instance_id="opennms-us-west"}
Label pipeline — rename vs. copy vs. exclude
Both labels.rename and labels.copy produce a label under a new name,
but the mental model differs:
-
labels.renamechanges a label’s name. The original disappears. -
labels.copyadds a second name for a label. Both names remain present with the same value.
They run in a fixed pipeline:
defaults → exclude → include → copy → rename → metadata
Copy is one-pass (sees labels that exist at its stage entry; does not
recurse) and operates on pre-rename names. Reserved-target rules apply
symmetrically to both — a to value that collides with a default label
name, a reserved prefix (onms_cat_*, onms_meta_*), another rename
target, or another copy target is rejected at startup with an actionable
error.
Common labels.copy recipes
# Multi-tenant Mimir — emit `tenant` as a copy of `foreign_source` so
# per-requisition dashboards and the backend's tenant-id convention
# both key off the same value.
labels.copy = foreign_source -> tenant
# Migration-period dual emission — when changing a label name, copy the
# old name onto the new one for a release cycle so dashboards and alert
# rules can migrate gradually. Drop the copy once the rename lands.
labels.copy = node -> old_node_id
labels.copy = node → instance is now redundantPre-0.2 deployments often copied |
If you want the value under a new name AND you want to drop the
original, use labels.rename — it does both in one directive. A
labels.copy source that doesn’t exist at copy time (typo, or a label
the plugin never emits on this deployment) produces a single startup
WARN naming the unknown source; it does not block startup.
Cross-source filtering with job and instance
Since 0.2.0 the plugin emits job and instance as default labels so
dashboards that compose OpenNMS data with node-exporter, OTel, or other
Prometheus data sources in the same backend can use the standard idiom:
# All OpenNMS-SNMP interface traffic for a specific node
{job="snmp", instance="NOC:router-42", __name__="ifHCInOctets"}
# Scope across data sources: everything about a host
{instance=~"NOC:router-42|10.0.0.1:9100"}
# Or by data source type
{job=~"snmp|node-exporter"}
The plugin’s instance value carries the OpenNMS-managed device
identity (<foreignSource>:<foreignId> when requisitioned, or the
numeric dbId), whereas node-exporter emits instance="<host>:<port>" —
same label name, different value shapes for the same physical device.
Cross-source value correlation (same label value across sources for the
same device) requires backend relabel_config; the shared label name
alone doesn’t bridge value shapes. job is the primary cross-source
scoping filter.
The job value is derived from each sample’s resourceId pattern:
-
bracketed and slash-path SNMP-originated data →
snmp -
snmp/fs/…/jmx-*oropennms-jvmgroups (prefix match on the literaljmx-, not a shell glob) →jmx -
unparseable shapes →
opennmscatch-all
Set job.name = <constant> in the cfg to override the derivation with a
fleet-wide constant value (useful when you want every sample from one
plugin instance under the same job, e.g., job.name = opennms-prod).
|
The
opennms catch-allSamples whose |
Reserved rename / copy targets
The plugin rejects labels.rename and labels.copy entries whose
target would silently clobber an already-emitted label. Reserved
targets:
| Kind | Value | Why |
|---|---|---|
Exact |
|
Prometheus metric name. |
Exact |
|
OpenNMS resource identifier (raw, lossless). |
Exact |
|
Derived FS-qualified or numeric node id. |
Exact |
|
Requisition identity. |
Exact |
|
Node’s human-readable name. |
Exact |
|
OpenNMS monitoring location. |
Exact |
|
Parsed from |
Exact |
|
SNMP interface descriptors. |
Exact |
|
Default labels (since 0.2.0). |
Exact |
|
Multi-instance origin stamp (reserved even when |
Prefix |
|
Per-surveillance-category expansion. |
Prefix |
|
Default metadata-passthrough prefix. |
Prefix |
|
Resource string attributes — meta partition (see Resource string attributes ( |
Prefix |
|
Resource string attributes — external partition (see Resource string attributes ( |
Duplicate rename targets (foo → cluster, bar → cluster) and
duplicate from keys (a → cluster, a → tenant) are also rejected.
When multiple rename or copy entries have errors, the plugin reports all
of them in one startup error so you fix once and restart once.
onms_meta_* reservation covers only the default prefixThe |
metadata.label-prefix itself is now collision-checkedWhile the |
Sanitization rules
The plugin sanitizes every metric name, label name, and label value to conform to the Prometheus text model before serialization:
-
Metric names: characters outside
[a-zA-Z0-9_:]replaced with; leading digit replaced with. -
Label names: characters outside
[a-zA-Z0-9_]replaced with; leading digit replaced with. -
Label values: truncated to the first 2048 bytes if longer.
-
NaN,+Infinity,-Infinitysample values are dropped before serialization (samples_dropped_nonfinite_totalincrements).
Backend compatibility
The plugin speaks Prometheus Remote Write v1 and v2 — every Remote Write backend on the market accepts either or both. See Wire protocols (v1 and v2) for the wire-format selection knob and the Prometheus 2.50–2.54 caveat.
CI matrix
| Backend | v1 (Prom 2.53.2 reference) | v2 (Prom 3.0.1 reference) | Coverage |
|---|---|---|---|
Prometheus |
✅ |
✅ |
Testcontainers integration test |
Grafana Mimir |
✅ |
✅ ( |
e2e smoke ( |
VictoriaMetrics |
✅ |
✅ (with v2 ingest enabled) |
e2e smoke ( |
Cortex |
Compatible — not in CI |
Compatible — not in CI |
|
Thanos Receive |
Compatible — not in CI |
Compatible — not in CI |
|
Grafana Cloud |
Compatible — not in CI |
Compatible — not in CI |
read.url shapes per backend
The plugin appends /api/v1/series and /api/v1/query_range itself —
do not include /api/v1 in the configured value.
| Backend | read.url |
|---|---|
Prometheus |
|
Grafana Mimir |
|
VictoriaMetrics |
|
Cortex |
|
Thanos Receive (Query) |
|
Grafana Cloud |
Tenancy notes
tenant.org-id sets the X-Scope-OrgID header. Behavior per backend:
| Backend | Honors X-Scope-OrgID? |
Notes |
|---|---|---|
Prometheus (vanilla) |
No (no-op) |
Single-tenant; partition with |
VictoriaMetrics single |
No (no-op) |
Single-tenant. |
VictoriaMetrics cluster |
Yes |
Routes to per-tenant ingester. |
Grafana Mimir |
Yes |
Required if Mimir is configured with |
Cortex |
Yes |
Same model as Mimir. |
Thanos Receive |
Yes |
Tenancy via the |
Grafana Cloud |
Yes |
The Cloud-issued credentials encode the tenant; |
See Labels and enrichment for the instance.id vs tenant.org-id
decision matrix when running multiple OpenNMS instances against the same
backend.
Out of scope (current line)
-
Native histograms and exemplars — the v2 wire layer reserves the fields, but the plugin doesn’t populate them. OpenNMS doesn’t surface histogram data through the TSS SPI today, and there’s no trace-ID source for exemplars on the OpenNMS side. Out-of-scope, not blocked by the wire format.
-
Per-series metadata (
help,unit,created_timestamp) — same reasoning as histograms: v2 reserves the fields; no source-side population today. -
mTLS client certificates — Basic, Bearer, and tenant-id header cover the common deployment shapes. Client-cert auth is a candidate for a future release if demand materializes.
-
Per-tenant routing / multi-destination fan-out — one
write.urland onetenant.org-idper plugin instance. For multi-destination, run multiple OpenNMS instances; an in-process fan-out remains a future-release candidate. -
Migration tooling from
opennms-cortex-tss-plugin— not in scope. Recommended migration shape: stand both plugins up, dual-write for a period, switch queries once the new labels are established, uninstall cortex-tss. No in-product tooling. -
Per-series
delete()— Prometheus Remote Write has no delete semantic.delete(Metric)is a no-op that logs a rate-limited WARN. Configure retention at the backend tier (Prometheus--storage.tsdb.retention, Mimir/VictoriaMetrics compactor). -
Full OpenNMS TSS compliance-suite pass — the compliance suite’s
shouldDeleteMetricsand whole-Metricpartition-equality assertions conflict with this plugin’s design.PrometheusComplianceITskips the conflicting tests with documented@Ignorereasons.
Operations
Self-metrics
The plugin exposes internal operational metrics via a Dropwizard registry
and prints them through the Karaf shell command
opennms:prometheus-writer-stats.
Throughput and drops
| Counter | What it counts |
|---|---|
|
Samples successfully delivered to the endpoint. |
|
Samples in batches the endpoint rejected with 4xx. |
|
Samples in batches that exhausted the 5xx retry budget. |
|
IOException / socket failures (distinct from 5xx so you can alert separately). |
|
Samples rejected because the in-memory queue ( |
|
Samples rejected (or evicted under |
|
Samples whose value was |
|
Same-timestamp same-series dedup (last-write-wins). |
|
Samples whose |
|
Reads where the Prometheus response lacked an |
Pipeline state
| Gauge | What it shows |
|---|---|
|
Current in-memory queue occupancy (when |
|
WAL segment count on disk (when |
|
Total WAL footprint on disk. |
|
Running + queued HTTP requests at the dispatcher. |
HTTP
| Counter | What it counts |
|---|---|
|
Total payload bytes (post-snappy) sent. |
|
2xx responses. |
|
Non-2xx + transport failures. |
Other
| Counter | What it counts |
|---|---|
|
|
|
Metadata keys blocked by the built-in credential denylist. |
|
Samples replayed from the WAL on startup recovery. |
Recommended alerts
These are starting points — tune to your deployment’s noise floor.
| Alert idea | Sketch |
|---|---|
Drop rate jumps |
|
WAL filling up |
|
Queue full (no-WAL deployments) |
|
|
|
Endpoint persistently unhappy |
|
TLS skip-verify left on in production |
log-based — the plugin emits a WARN every hour |
Log levels
Set in etc/org.ops4j.pax.logging.cfg or via the Karaf shell
(log:set DEBUG org.opennms.plugins.tss.prometheusremotewriter).
| Level | What you see |
|---|---|
|
Plugin lifecycle (start / stop), config-change diff, WAL
startup recovery summary, |
|
4xx response bodies (truncated), |
|
Per-batch flush results, retry timing, label-pipeline diff for the first sample after a config reload. Verbose — use sparingly. |
Capacity sizing
These are rough rules-of-thumb, not hard requirements. Measure your deployment.
| Sample rate | queue.capacity / WAL size |
Notes |
|---|---|---|
< 5k samples/sec |
defaults |
|
5k–25k samples/sec |
|
Backpressure if your network RTT is high. |
25k+ samples/sec |
enable WAL with |
Single-process Karaf is the bottleneck before the wire; consider sharding by source. |
WAL footprint scales with outage tolerance: at 10k samples/sec each sample averages roughly 30–80 bytes pre-snappy (the wire format is compressed; the WAL is not). Plan for ~30 MB/s of disk in-flight if you want to survive a 10-minute outage.
Karaf shell commands
| Command | Purpose |
|---|---|
|
Print all counters and gauges. |
|
Confirm the bundle is active and resolved. |
|
Inspect the current effective config (any value not in the cfg file falls through to default). |
Self-monitoring with the plugin itself
The plugin’s metrics are exposed via Dropwizard, which means you can scrape the Karaf JVM with a JMX exporter and feed those self-metrics back into the same Prometheus backend the plugin writes to. That gives you one Grafana dashboard with both OpenNMS data and the plugin’s own operational health.
A future release may add a built-in HTTP scrape endpoint to remove the JMX-exporter step. Not on the v0.4 roadmap.
Troubleshooting
Plugin won’t start
Karaf shell shows the bundle as Failure or unresolved.
-
Check the Karaf log —
tail /opt/opennms/data/log/karaf.log(or wherever your distribution writes it). Configuration validators emit actionable error messages naming the offending key. -
Validators that fail at startup (refuse to start):
Symptom Likely cause labels.rename target 'X' collides with the default label 'X'Pick a non-reserved target — see Labels and enrichment.
labels.copy target 'X' collides …Same as above; rules apply symmetrically to copy.
wire.protocol-version=3 is not a valid valueSet to
1or2.Both auth.basic.* and auth.bearer.token are configuredThey’re mutually exclusive — pick one.
instance.id contains a control character/… exceeds 2048 bytesValidation rejects unprintable / oversized values.
WAL path is not writable: …The Karaf user can’t write to
wal.path. Check ownership / SELinux / mount options.Corrupt checkpoint.json: …Recover by stopping the plugin and removing the WAL directory.
-
Check OSGi feature resolution —
feature:list -i | grep prometheusin Karaf. Missing transitive features (older Karaf base) will appear asUnsatisfied.
Backend returns 4xx for every batch
The endpoint is rejecting the format. This is usually one of:
-
Wire-format mismatch —
wire.protocol-version=2against a v1-only or v2-but-buggy backend (the Prometheus 2.50–2.54 trap; see Wire protocols (v1 and v2)). Drop to v1 to confirm. -
Auth — Bearer token expired, Basic auth incorrect, or
tenant.org-idnot whitelisted on the backend. The 4xx response body is logged at WARN — read it. -
Sample rejection — Mimir / Cortex limit on series-per-tenant or label-name length. Look for
out of order sample,max-series-per-user,label name too longin the logged response body.
samples_dropped_4xx_total increments by the rejected batch size each
time. If you’re seeing a steady rate, the deployment isn’t recovering
on its own — fix the underlying cause.
Backend returns 5xx persistently
The plugin retries 5xx responses with exponential backoff up to
retry.max-attempts (default 5), then drops. The same batch is
re-enqueued only when wal.enabled=true — without the WAL, exhausted
5xx batches are gone.
-
Endpoint overloaded — Mimir ingester hot-spots, Prometheus TSDB compaction storms. Inspect the backend’s own metrics first.
-
Network path saturated —
samples_dropped_transport_totalwill rise alongside 5xx if upstream is the bottleneck. Confirm withiftop/nethogson the OpenNMS host.
If the outage is short enough to fit in wal.max-size-bytes, enabling
the WAL turns 5xx outages into delivery delays instead of drops. See
Write-Ahead Log.
Series cardinality exploded
Three usual culprits:
-
node_labelchurn — node renames create new series. Drop withlabels.exclude = node_labelif renames are routine. -
if_descrchurn — vendor-generated; firmware upgrades change it. Same fix:labels.exclude = if_descr. -
Large
onms_cat_fan-out* — nodes with many surveillance categories add one label each per series. If your cardinality budget is tight, consider whether all categories need to be promoted to labels.
labels.include = * is rarely correct in production — it surfaces every
non-default source tag. The default allowlist is deliberately narrow.
job="opennms" proportion is high
Samples whose resourceId matches none of the parser grammars
(bracketed, slash-FS, slash-DB) fall through to job="opennms". The
samples_unparseable_resource_id_total counter tracks the rate.
A rising counter is a signal that:
-
a new OpenNMS collector is emitting a
resourceIdshape the parser doesn’t recognise yet, OR -
a parser regression has shipped.
Either way, file an issue with example resourceId strings — the parser
is the project’s responsibility, not the operator’s.
WAL directory is missing across restart
Symptom: wal.enabled=true, but every restart starts from an empty WAL
and you’ve lost the durability guarantee.
Cause: containerised Karaf with no mounted volume. ${karaf.data} is
ephemeral — the default wal.path evaporates on container restart.
Fix: set wal.path explicitly to a path that’s mounted from a
persistent volume.
Plugin starts but no samples reach the backend
-
Confirm OpenNMS sees the plugin as the active TSS:
org.opennms.timeseries.strategy = integrationin
etc/opennms.properties.d/timeseries.properties. OpenNMS restart required for the strategy switch — confirm with the OpenNMS log. -
Confirm collectors are running and producing samples —
karaf@root()> log:tail, look forcollectdactivity. -
Check
samples_written_totalandhttp_writes_successful_totalfromopennms:prometheus-writer-stats. If both are 0, samples aren’t reaching the plugin (OpenNMS-side issue). Ifsamples_written_totalrises but the backend has nothing — checkwire.protocol-version, the Prometheus 2.50–2.54 trap (Wire protocols (v1 and v2)), and thesamples_dropped_*counters.
TLS skip-verify left on by accident
The plugin emits a WARN on startup and every hour when
tls.insecure-skip-verify=true. Search for that log line:
grep 'tls.insecure-skip-verify' /opt/opennms/data/log/karaf.log
Production deployments should always have a valid CA chain. Use
tls.ca-file to point at a private bundle if your backend uses an
internal CA.
Where to ask
This is an incubation project — community channels first; there is no commercial support yet.
-
Discussion / questions / use-case help — opennms.discourse.group.
-
Bug reports / feature requests — open an issue at https://github.com/opennms-forge/prometheus-remote-writer/issues.
Include with any report:
-
Plugin version (
opennms:prometheus-writer-statsprints it). -
Backend type and version.
-
Relevant config (with secrets redacted).
-
The 4xx response body if applicable, or a short Karaf log excerpt for startup failures.
End-to-end sandbox
A self-contained Docker Compose stack for manually exercising
OpenNMS Prometheus Remote Writer against real Prometheus-compatible backends. Lives under
e2e/ in the repo and ships with a per-backend smoke harness wired into
the project Makefile.
Use it to:
-
Try the plugin against a fresh Prometheus / Mimir / VictoriaMetrics before committing to a production install.
-
Iterate on plugin code locally with a real OpenNMS Horizon container in the loop.
-
Reproduce a backend-specific issue someone reported, with a known-good reference stack.
|
The sandbox is single-core by design — no Minion, no Sentinel, no ActiveMQ/Kafka, no TLS. |
Layout
e2e/
├── compose.base.yml # shared: postgres + core + grafana
├── compose.prometheus.yml # extends base, adds prometheus + per-backend mounts
├── compose.mimir.yml # extends base, adds mimir + per-backend mounts
├── compose.victoriametrics.yml # extends base, adds vm + per-backend mounts
├── opennms/
│ ├── opennms.properties.d/
│ │ └── timeseries.properties # activates TSS integration strategy
│ ├── prometheus.cfg # plugin cfg for the prometheus backend
│ ├── mimir.cfg # plugin cfg for the mimir backend
│ └── victoriametrics.cfg # plugin cfg for the vm backend
├── grafana/
│ └── datasources/ # one file per backend; the matching
│ ├── prometheus.yml # compose.<backend>.yml mounts it to
│ ├── mimir.yml # Grafana's provisioning dir
│ └── victoriametrics.yml
├── prometheus/
│ └── prometheus.yml # minimal Prom config with remote-write receiver
└── mimir/
└── mimir.yaml # single-binary Mimir config
Prerequisites
-
Docker 24+ with Compose v2.
-
Build the KAR first so the
corecontainer can pick it up from the mountedassembly/kar/target:make kar
Running
One compose file per backend — nothing else to match up. The base
services (postgres, core, grafana) are defined once in
compose.base.yml; each backend file extends: them and appends the
backend-specific plugin cfg and Grafana datasource mounts.
# Prometheus
docker compose -f e2e/compose.prometheus.yml up -d
# Grafana Mimir
docker compose -f e2e/compose.mimir.yml up -d
# VictoriaMetrics
docker compose -f e2e/compose.victoriametrics.yml up -d
First boot of the core container can take several minutes while
OpenNMS creates the database and loads features. Watch for:
Starting Karaf...
Endpoints
| Service | URL | Default credentials |
|---|---|---|
OpenNMS Web UI |
|
|
OpenNMS Karaf SSH |
|
|
Grafana |
|
|
Prometheus UI |
http://localhost:9090/ (when active) |
— |
Mimir UI |
http://localhost:9009/ (when active) |
tenant |
VictoriaMetrics UI |
http://localhost:8428/ (when active) |
— |
Grafana auto-provisions a datasource pointing at whichever backend is active (selected by the compose file you brought up). Open Explore → OpenNMS (<backend>) to run PromQL against the data OpenNMS just wrote.
Smoke test (automated)
The smoke harness lives entirely in the project Makefile. Per project
convention, CI invokes make smoke directly.
make smoke # default backends: prometheus, mimir, victoriametrics
make smoke BACKENDS=prometheus # single backend
make smoke BACKENDS="mimir victoriametrics"
make smoke-prometheus # convenience wrapper, equivalent to BACKENDS=prometheus
make smoke SMOKE_TIMEOUT=300 # tighter deadline (default 600s per backend)
make smoke SMOKE_POLL=5 # tighter poll interval (default 15s)
Each backend is brought up, polled for > 0 ingested series, and torn
down. The target depends on kar, so a fresh KAR is built first.
Pass/fail summary is printed at the end; on a timeout, the last 40
lines of the relevant container’s karaf.log are dumped before
teardown.
Verifying the plugin is active
# Karaf shell (default admin/admin)
ssh -p 8101 admin@localhost
Inside Karaf:
karaf@root()> feature:list | grep prometheus-remote-writer
karaf@root()> bundle:list | grep prometheus-remote-writer
karaf@root()> opennms:prometheus-writer-stats
For what opennms:prometheus-writer-stats reports, see
Operations.
Querying the backend
Once OpenNMS has collected a few samples (default interval: 5 minutes on a fresh provisioning), query the backend directly.
Prometheus:
curl 'http://localhost:9090/api/v1/series?match%5B%5D={__name__=~".%2B"}' | jq .
curl 'http://localhost:9090/api/v1/query?query=up' | jq .
Mimir (requires the X-Scope-OrgID header; tenant is e2e per the cfg):
curl -H 'X-Scope-OrgID: e2e' \
'http://localhost:9009/prometheus/api/v1/series?match%5B%5D={__name__=~".%2B"}' | jq .
VictoriaMetrics:
curl 'http://localhost:8428/api/v1/series?match%5B%5D={__name__=~".%2B"}' | jq .
Tear down
Use the same -f you brought the stack up with:
docker compose -f e2e/compose.prometheus.yml down -v --remove-orphans
-v removes the named data volumes (postgres, opennms, prometheus,
mimir, vm). Drop -v if you want to keep state across restarts.
Iterating on the plugin
The assembly/kar/target directory is mounted read-only into
/opt/opennms/deploy/. A rebuild of the KAR (make kar from the repo
root) does not auto-reload the plugin — Karaf’s hot-deploy watches
file timestamps, but the container sees the mount at a point in time.
To reload a freshly built KAR:
# From the repo root
make kar
# Restart only the core container (use whichever compose file is active)
docker compose -f e2e/compose.prometheus.yml restart core
Or, inside the Karaf shell, feature:uninstall + feature:install
cycles the plugin without restarting the container.
What’s NOT exercised here
-
Minion / remote pollers — this is a single-core sandbox.
-
ActiveMQ / Kafka messaging — not needed for the local TSS path.
-
TLS / auth to the backend — all cleartext on the compose network.
-
Multi-tenant routing beyond Mimir’s default
e2etenant. -
Dashboards — Grafana is provisioned with a datasource only; build dashboards on top in Explore or by dropping JSON under a
grafana/dashboards/provisioning directory.