github robintra/perf-sentinel v0.8.7

latest release: chart-v0.2.52
8 hours ago

What's new in v0.8.7

v0.8.7 makes previously silent loss paths observable, adds a daemon settings advisor, hardens the daemon and batch CLI against the edge cases a high-scale limit-testing campaign exposed, and refreshes the embedded carbon data against primary sources. Four new Prometheus counters surface dropped OTLP spans, an over-cap service fleet, and correlator evictions. The daemon now advises on undersized configuration at runtime, admission-controls the cross-trace correlator to bound memory on wide topologies, and keeps /health responsive under saturation floods. There is no breaking change to the daemon wire protocol, the configuration format, or any existing command. The minimum supported Rust version stays 1.96.0. The release-gate lab validation passed 35 of 35 scenarios.

Daemon: loss observability

Four new counters make drop paths that used to be silent visible. perf_sentinel_otlp_spans_received_total and perf_sentinel_otlp_spans_filtered_total{reason} expose the retention ratio of the deliberate I/O filter: a fleet whose instrumentation strips db.statement or http.url converts every request to zero events while requests keep returning success, and this counter pair is the only signal that makes it visible. perf_sentinel_service_io_ops_overflow_total counts I/O ops that received no per-service attribution because the 1024-service metering cap was reached, and perf_sentinel_correlator_pairs_evicted_total counts cross-trace pairs dropped at the max_tracked_pairs cap.

Daemon: settings advisor

/api/export/report now emits tuning entries in Report.warning_details when a lifetime counter shows a configuration knob undersized for the observed load. Each message names the knob, its current value, and the suggested adjustment. Six rules cover analysis-queue shedding, ingest-queue rejections, a near-full trace window, the per-service metering cap, correlator pair evictions, and zero analyzable-span retention. The advisor reads the config snapshot taken at daemon startup, so a hint always reflects the values the running process actually uses. It complements the static comfort-zone warnings emitted at startup with runtime evidence.

Daemon: wide-topology and saturation hardening

The cross-trace correlator now admission-controls new pairs inside a batch. The max_tracked_pairs cap was previously enforced only at batch end, so one batch of findings from a wide topology could insert millions of pair entries before the first eviction ran, and the map's high-water capacity was never returned to the allocator: at 1500 services the daemon used to exhaust a 256Mi pod in about a minute. With admission control the same load holds a flat resident set near 57 MiB. Concurrent OTLP decode is now bounded on both the HTTP and gRPC paths, so a saturation flood can no longer monopolize protobuf-decode CPU and starve the /health liveness probe. The ingest enqueue waits at most two seconds for a slot, then rejects retryably (HTTP 503, gRPC UNAVAILABLE) and counts channel_full, instead of parking the sender until the request timeout with no rejection ever recorded.

Batch CLI: input cap decoupled from the network limit

analyze, diff, report, explain, calibrate, pg-stat and bench now read local input files under a fixed 1 GiB cap instead of the daemon's network payload limit, whose 100 MiB ceiling made any larger trace export unanalyzable. Oversized files are rejected from their metadata before a byte is read. The HTML dashboard now bounds the findings it embeds, critical first, with a banner stating the kept-versus-total split, so a batch with tens of thousands of findings no longer produces a 50 MB file. The JSON report keeps the full set, and --max-traces-embedded opts out of size targeting entirely.

Performance

The ISO 8601 timestamp parser gains a fixed-layout fast path for the canonical form every converter emits, dropping per-parse allocations and taking the common case from 69.5 ns to 8.4 ns. The daemon's per-service meter caches its labeled Prometheus counter children, so the per-event path is one map lookup and an atomic add rather than a label hash plus a metric-vector lock. The batch CLI frees the raw input buffer before analysis starts, and the HTML trim path serializes findings once instead of cloning the whole report. New measurement infrastructure ships alongside: a seeded synthetic trace generator, a criterion suite over every pipeline stage, bench --synthetic-events for fixture-free runs, and a profiling build profile for flamegraphs.

Carbon data refresh

The embedded grid intensities are refreshed against primary sources (Electricity Maps consumption-based 2023-2024, corroborated by Ember). The Paris regions move from 56 to 41 gCO2eq/kWh, Sao Paulo from 62 to 96, and Belgium from 187 to 165. The eu-central-1 hourly profile is rescaled from a stale 2022 coal-crisis level to the current grid level, resolving a long-standing divergence from the annual table value, so the profile-versus-annual invariant now holds for every region. The generic PUE fallback rises from 1.2 to 1.5, tracking the Uptime Institute survey weighted average, while hyperscaler regions keep their own provider PUE. A fabricated database-energy citation is replaced with Z. Xu, Y.-C. Tu and X. Wang, "Exploring Power-Performance Tradeoffs in Database Systems", IEEE ICDE 2010, the network transport coefficient's provenance is corrected, and the SCI wording is aligned with the specification revisions.

Documentation

ARCHITECTURE.md, METRICS.md, RUNBOOK.md, CONFIGURATION.md, LIMITATIONS.md, METHODOLOGY.md and the per-stage design notes are updated for the new counters, the tuning advisor, the bounded enqueue, and the decoupled batch cap, with the French mirrors updated in lockstep. RUNBOOK.md gains a measured sizing reference drawn from the saturation curve. The stale benchmark tables are dropped from the design docs, which now point at git history instead.

Helm chart

charts/perf-sentinel 0.2.51 to 0.2.52, appVersion 0.8.6 to 0.8.7. Template surface unchanged, additive metadata only.

Operator-visible behavior change

Carbon outputs shift for the refreshed regions: Paris and Frankfurt hourly reports drop, Sao Paulo rises, and any region in the generic PUE bucket rises by the 1.2 to 1.5 ratio. Under sustained saturation the gRPC ingest path now returns UNAVAILABLE instead of INTERNAL, which compliant OTLP exporters retry rather than dropping the batch. Deployments running under sustained saturation should give the liveness probe headroom, for example timeoutSeconds: 5 and failureThreshold: 5. The detect and score verdicts, the daemon routes, the OTLP wire shape, and the existing configuration keys are unchanged.

Why this is a patch and not a minor

The release is additive and backward compatible. The four counters, the tuning advisor, and the measurement infrastructure are additive. The correlator admission control, the bounded enqueue, the bounded decode concurrency, and the findings trim are internal hardening with no change to detection verdicts, daemon routes, the OTLP wire protocol, or existing configuration keys. The batch input cap is raised, not lowered. The carbon refresh changes output values but no schema or interface. The minimum supported Rust version stays 1.96.0.

Verifying this release

# Binary integrity via SLSA Build L3 attestation
gh attestation verify perf-sentinel-linux-amd64 \
  --owner robintra --repo perf-sentinel

# A periodic disclosure produced by this binary
perf-sentinel verify-hash --report perf-sentinel-report.json \
  --expected-identity "https://github.com/robintra/perf-sentinel/.github/workflows/release.yml@refs/tags/v0.8.7" \
  --expected-issuer "https://token.actions.githubusercontent.com"

gh CLI 2.49 or newer required for gh attestation verify (unchanged from v0.7.2).

Full Changelog: v0.8.6...v0.8.7

Don't miss a new perf-sentinel release

NewReleases is sending notifications on new releases.