What's new in v0.5.19
v0.5.19 closes 3 observability gaps in the daemon surfaced by downstream validation work on the simulation lab. Standard process collector metrics (process_resident_memory_bytes, process_open_fds, process_start_time_seconds, process_cpu_seconds_total, ...) are now exposed on /metrics on Linux, so operators get RSS and FD pressure visibility without depending on an external metrics-server. A new perf_sentinel_otlp_rejected_total{reason} counter quantifies OTLP backpressure with 3 labels (unsupported_media_type, parse_error, channel_full), each pre-warmed to 0 at startup so dashboards plot the zero-line before the first rejection. And Report.warning_details: Vec<Warning> adds a structured {kind, message} channel alongside the legacy Report.warnings: Vec<String> field, populated by the daemon cold-start path (kind="cold_start") and dynamically by /api/export/report from the rejected counter (kind="ingestion_drops").
The 3 fixes are purely additive on the observability layer. No ingestion behavior changes: requests that were accepted before are still accepted, requests that were rejected before are still rejected with the same status codes, the difference is that rejections are now visible in /metrics and surfaced in the report payload. The legacy Report.warnings field is preserved byte-for-byte, renderers prefer warning_details when non-empty and fall back to warnings otherwise. Pre-0.5.19 baselines parse fine thanks to serde(default, skip_serializing_if = "Vec::is_empty") on the new field.
The release also lands the supply-chain pinning policy documentation as docs/SUPPLY-CHAIN.md (and its FR mirror), which formalizes the project's stance: GitHub Actions are pinned by SHA in .github/workflows/, Helm chart CLI invocations stay on latest (lower-risk surface), and the docs/ci-templates SHA drift versus upstream is accepted by design. The doc is the reference for contributors wondering why some pins look frozen and others do not.
Added
perf_sentinel_otlp_rejected_total{reason}counter on/metrics(crates/sentinel-core/src/report/metrics.rs). 3 reason labels:unsupported_media_type(HTTP only,Content-Typeis notapplication/x-protobuf),parse_error(HTTP only, prost decode failed),channel_full(HTTP and gRPC, event channel saturated). All pre-warmed to 0 at startup.payload_too_largeis intentionally absent: tower-http and tonic enforce the cap upstream and reject before the application handler runs.- Process collector metrics on
/metrics(Linux only):process_resident_memory_bytes,process_virtual_memory_bytes,process_open_fds,process_max_fds,process_start_time_seconds,process_cpu_seconds_total. Registered viaprometheus::process_collector::ProcessCollector::for_self()behind#[cfg(target_os = "linux")]so the macOS and Windows builds do not pay for failed/proc/self/*reads on every scrape. Report.warning_details: Vec<Warning>field on the report payload, withWarning { kind: String, message: String }defined in the newcrates/sentinel-core/src/report/warnings.rsmodule. Twokindvalues ship in 0.5.19:cold_start(returned by/api/export/reportuntil the first batch lands) andingestion_drops(computed dynamically fromotlp_rejected_total{channel_full}when positive). Renderers prefer the structured field when non-empty and fall back to the legacyReport.warnings: Vec<String>(0.5.16+) otherwise.Warning::from_untrusted(kind, message)constructor that strips Unicode BiDi-override and invisible-format characters viareport::sarif::strip_bidi_and_invisible. Trojan Source defense (CVE-2021-42574) for future contributors wiring a Warning sourced from an OTLP attribute or any other attacker-influenced channel. Documented as the required entry point for untrusted bytes in the module-level doc comment.docs/METRICS.mdanddocs/FR/METRICS-FR.md: exhaustive reference for every metric exposed on/metrics, including a per-scrape cost note for the new process collector (FD walk dominates at thousands of long-lived connections) and an exposure scope note recommending KubernetesNetworkPolicyplus Prometheus mTLS when the daemon binds to0.0.0.0.docs/SUPPLY-CHAIN.mdanddocs/FR/SUPPLY-CHAIN-FR.md(#10): the pinning policy reference. Documents what gets SHA-pinned (.github/workflows/actions), what stays onlatest(Helm CLI lints, lower-risk because no repo perms or secrets access), and thedocs/ci-templatesdrift acceptance.- "Diagnosing OTLP drops" and "Reading Report warnings" sections in
docs/RUNBOOK.mdanddocs/FR/RUNBOOK-FR.md: operator recipes for cross-checking the new counter against process metrics and thewarning_detailspayload. - 14 new tests across
report::warnings,report::mod,report::metrics,ingest::otlp,daemon::query_api, plus 1 e2e test incrates/sentinel-cli/tests/e2e.rsthat pins the JSON shape ofReport.warning_details. Includes a#[cfg(not(target_os = "linux"))]symmetric test that locks the platform gating of the process collector. crate::test_helpers::empty_report()factory for unit tests that need a defaultReportshape, replacing the long boilerplate at every call site.
Changed
MetricsStatecaches the 3 OTLP rejection counters asIntCounterfields (otlp_rejected_unsupported_media_type,otlp_rejected_parse_error,otlp_rejected_channel_full).record_otlp_reject(reason)becomes a branchlessmatchplus atomicinc(), no per-rejection HashMap label lookup. Avoids amplifying daemon slowdown via metric overhead under a backpressure storm. TheIntCounterVecis kept on the struct for/metricsrendering and tests, only the hot path uses the cached children.otlp_http_routerandOtlpGrpcService::newacceptOption<Arc<MetricsState>>as a new parameter (crates/sentinel-core/src/ingest/otlp.rs).Some(metrics)in daemon mode (passed throughdaemon/listeners.rs),Nonefor batch CLI and tests so the existing call sites stay zero-cost. Each rejection site (HTTP unsupported_media_type, HTTP parse_error, HTTP channel_full, gRPC channel_full) callsm.record_otlp_reject(reason)when the metrics handle is present.docs/ci-templates/PERF_SENTINEL_VERSIONpin bumped from0.5.17to0.5.18acrossgitlab-ci.yml,github-actions.yml,github-actions-baseline.yml, andjenkinsfile.groovy. Materializes in this release so users curling the templates pull the recent binary by default.
Behavior
- No change to ingestion behavior. Requests that were accepted before are still accepted, rejected requests still return the same status codes (
415,400,503HTTP,INTERNALgRPC). The difference is that rejections are now visible in/metricsandReport.warning_details. - Backward compatibility on
ReportJSON. The newwarning_detailsfield is additive viaserde(default, skip_serializing_if = "Vec::is_empty"). Pre-0.5.19 baselines saved withreport --before <baseline.json>parse without modification. The legacywarnings: Vec<String>field (0.5.16+) is preserved byte-for-byte, populated as before by the daemon cold-start path. Renderers preferwarning_detailswhen non-empty and fall back towarningsotherwise. - Process metrics are Linux only. Operators on macOS and Windows hosts continue to see the
perf_sentinel_*metrics and nothing else underprocess_*. Theprometheuscrate'sprocessfeature is now activated, but the registration site is gated by#[cfg(target_os = "linux")]so non-Linux scrapes do not pay for failed/proc/self/*reads. - Built artifacts are slightly larger. Activating the
processfeature pullsprocfsas a transitive dependency on Linux. A few KB on the binary, no runtime cost off the scrape path.
Documentation
- New
docs/METRICS.mdanddocs/FR/METRICS-FR.md: full per-metric reference grouped by category (process, OTLP ingestion, analysis and findings, GreenOps), with cardinality, label catalog, and the per-scrape cost note for the process collector. - New
docs/SUPPLY-CHAIN.mdanddocs/FR/SUPPLY-CHAIN-FR.md(from #10): pinning policy reference for contributors and reviewers. docs/RUNBOOK.mdanddocs/FR/RUNBOOK-FR.mdextended with two diagnostic recipes ("Diagnosing OTLP drops" and "Reading Report warnings"), including the rationale for whypayload_too_largeis not counted by the new counter.README.mdandREADME-FR.mdmentionwarning_detailsand the new/metricssurfaces in the daemon section, with cross-links to the new docs.
Install
Prebuilt binaries (Linux amd64 / arm64, macOS arm64, Windows amd64):
curl -LO https://github.com/robintra/perf-sentinel/releases/download/v0.5.19/perf-sentinel-linux-amd64
chmod +x perf-sentinel-linux-amd64
sudo mv perf-sentinel-linux-amd64 /usr/local/bin/perf-sentinelLinux binaries are statically linked against musl and run on any distribution (Alpine, Debian, RHEL, Ubuntu any version) regardless of glibc version, and inside FROM scratch images.
From crates.io:
cargo install perf-sentinel --version 0.5.19Docker:
docker run --rm -p 4317:4317 -p 4318:4318 \
ghcr.io/robintra/perf-sentinel:0.5.19 watch --listen-address 0.0.0.0Also available on Docker Hub: robintrassard/perf-sentinel:0.5.19.
Helm (chart 0.2.22 ships 0.5.19 as its appVersion default):
helm install perf-sentinel oci://ghcr.io/robintra/charts/perf-sentinel \
--version 0.2.22 \
--namespace observability --create-namespaceVerify the binary against SHA256SUMS.txt:
curl -LO https://github.com/robintra/perf-sentinel/releases/download/v0.5.19/SHA256SUMS.txt
sha256sum -c SHA256SUMS.txt --ignore-missingFull diff: v0.5.18...v0.5.19