Automated release from CI pipeline
Changes:
perf(beyond-sota): ADR-154 M2 — FFT planner hoist (1.84x, bit-identical) + 3 honest perf nulls + boundary tests (#1055)
- perf(signal): hoist FFT planner across subcarriers (ADR-154 §7.4 #20)
compute_multi_subcarrier_spectrogram called compute_spectrogram once per
subcarrier, and each call built a fresh FftPlanner + re-planned the same
length-window_size FFT. Hoist the plan + window out of the per-subcarrier
loop via a new compute_spectrogram_with_plan core that takes a pre-planned
Arc and pre-built window. compute_spectrogram delegates to it
(unchanged behaviour); the multi-subcarrier path plans once and reuses.
MEASURED-HOT (dsp_perf_bench, this box): at 56 subcarriers, window 128,
fresh-planner-per-subcarrier 467.88 µs -> hoisted-plan 254.75 µs = 1.84x;
window 256: 627.27 µs -> 448.39 µs = 1.40x. Plan-forward cost alone is
~1.86 µs (w128), x56 subcarriers ~= the removed delta.
Output is bit-identical: multi_subcarrier_hoisted_plan_bit_identical
compares f64::to_bits of every spectrogram value + freq/time resolution
against the per-call fresh-planner path across all 4 window functions x
{power,magnitude} on a 56-subcarrier matrix. The numeric STFT body is the
old loop verbatim; only plan/window construction is lifted.
Co-Authored-By: claude-flow ruv@ruv.net
Three "+ test" backlog gaps closed — pure additions, no behaviour change
(phase_align refactor is internal: estimate_phase_offsets still returns the
identical offset vector; a counted core is split out only to observe the
iteration count).
#14 cir.rs fft_operator — fft_operator_within_tolerance_of_dense_canonical56:
the opt-in FFT Φ/Φᴴ path changes the witness hash, so pin it numerically
CLOSE to the dense path (not silently divergent). Asserts the full Cir
output (every tap within 1e-2·dominant, dominant idx/ratio, active_tap_count,
ranging_valid, rms_delay_spread) on the production canonical-56 config
across τ ∈ {20,50,90} ns. Extends the existing HT20/single-τ test.
#16 phase_align.rs — refinement_terminates_at_iteration_cap_when_not_converging:
forces non-convergence (tolerance=0.0, unreachable) and asserts the loop
runs exactly max_iterations then returns — proving the cap, not convergence,
bounds the loop (no infinite spin). Companion
refinement_converges_before_cap_on_easy_input proves the cap is an upper
bound, not the only exit.
#19 csi_ratio.rs — ratio_finite_at_and_below_1e_12_epsilon: the module
implements the CSI ratio as the conjugate product H_i·conj(H_j) (no
division), so it is finite even at/below the 1e-12 magnitude boundary a
naive H_i/H_j division would need an epsilon to guard. Pins finiteness +
bit-exact conjugate product at the boundary (zero target → zero, never
inf/NaN), through the amplitude/phase extraction.
cargo test -p wifi-densepose-signal --no-default-features --lib: 447 passed,
0 failed; --features cir --lib: 447 passed, 0 failed.
Co-Authored-By: claude-flow ruv@ruv.net
- docs(adr-154): record Milestone-2 P2-perf verdicts + boundary tests (§7.4)
§7.4: #20 MEASURED-HOT (1.40–1.84× spectrogram FFT-plan hoist, bit-identical);
#5/#6/#7 MEASURED-NULL (benched, not hot, left as-is — sub-µs / stack-only /
alloc-once); #8 MEASUREMENT-ONLY (per-call 56×56 eigh cost; eigenvalue/BLAS
backend un-buildable on this Windows host, number deferred to a BLAS box, NOT
fabricated; also corrects the finding — extract_perturbation reuses cached
modes, the recompute is in estimate_occupancy). #14/#16/#19 RESOLVED (tolerance
/ convergence-cap / epsilon-boundary tests). Updated §7.4 intro + Horizon-ledger
(deferred count 41→36). CHANGELOG [Unreleased] entry added.
Co-Authored-By: claude-flow ruv@ruv.net
New dsp_perf_bench.rs backs every Milestone-2 perf verdict with a committed
criterion bench — no speedup claimed without a before/after number here, and
a benched NULL is the proof a micro-opt was unnecessary (the §5.x "already
amortized" pattern). Registered in Cargo.toml [[bench]].
MEASURED (this box, criterion medians):
#20 spectrogram_multi_subcarrier (fresh vs hoisted plan):
MEASURED-HOT — 467.88→254.75 µs (1.84x) @ sc56/w128; 627.27→448.39 µs
(1.40x) @ sc56/w256. Optimized in the prior commit.
#5 multistatic_attention/weights: MEASURED-NULL — 181 ns (2 nodes) ..
848 ns (8 nodes); sub-µs, no hot-path alloc — left as-is.
#6 tomography_reconstruct/solve: MEASURED-NULL — 47.5 µs (16 links) /
60.4 µs (32 links) for a full 50-iter ISTA solve; the 2 per-solve voxel
buffers (~4 KB) are negligible vs O(iters·links·voxels) compute, and
reconstruct(&self) reuses them across iterations already — left as-is.
#7 pose_kalman_update/cycles: MEASURED-NULL — 150 ns (17 kpts) / 2.82 µs
(170); the Kalman "gain matrices" are fixed-size STACK arrays
([[f32;3];6]), zero heap — nothing to reuse — left as-is.
#8 field_model_occupancy (eigenvalue feature): MEASUREMENT-ONLY — quantifies
the per-call n×n eigendecomposition cost; incremental SVD is a sized
future project, not attempted (number recorded in ADR-154 §7.4).
Reproduce:
cargo bench -p wifi-densepose-signal --no-default-features --bench dsp_perf_bench
cargo bench -p wifi-densepose-signal --bench dsp_perf_bench # adds #8
Cargo.lock: dev-dep (criterion/clap) graph + crate version bumps from the
build; no runtime-dependency change.
Co-Authored-By: claude-flow ruv@ruv.net
Docker Image:
ghcr.io/ruvnet/RuView:865f9dee77520365ff3adac2fea818fc685178ab