github lance0/xfr v0.9.14

4 hours ago

The durable fix for live UDP loss visibility under upload-mode saturation (issue #70). Three swings (v0.9.11, v0.9.12, v0.9.13) addressed pieces of the problem; v0.9.14 ships the architectural answer.

Install

cargo install xfr                        # crates.io
brew install lance0/tap/xfr              # macOS / Linux Homebrew
# or grab a binary from this release page

Fixed

  • Live UDP loss counter no longer stalls under upload-mode saturation (#70 final fix). v0.9.13's TCP_NODELAY was necessary but not sufficient — the server's stats sampling timer used tokio::time::interval which defaulted to MissedTickBehavior::Burst, so when writer.write_all() stalled under back-pressure (TCP control competing for ACKs against the saturated UDP uplink), missed ticks accumulated and fired as a burst when the writer unblocked, producing stale interval samples with fresh client-side arrival timestamps. Skip now drops the stale ticks; cumulative state still surfaces correctly on the next live tick. Applied unconditionally — pre-v0.9.14 clients pairing with a v0.9.14 server also benefit.
  • --omit no longer folds hidden UDP loss into the first visible interval. The cumulative-loss baseline now advances during the omit window so the first visible line reports only loss observed during printed intervals.

Added

  • UDP receiver feedback (udp_feedback_v1 capability). When both peers advertise it, the server emits a 36-byte cumulative (packets_received, packets_lost) UDP packet back to the client at 2 Hz on the same data socket, sidestepping the TCP control channel for live UDP loss reporting. Wire format is fixed 36 bytes with b"XFRF" magic + version + kind + flags + stream_id + reserved + elapsed_ms + cumulative counts (all big-endian). Length-first demux at receive sites distinguishes feedback from data without inspecting sequence-number bits. Cumulative-not-delta semantics mean the client recovers from any dropped feedback packet on the next tick. Capability negotiation gates emission so older clients never see a packet they don't understand.
  • Producer-side monotonic-denominator filter on the client. Both the TCP control udp_progress decode site and the UDP feedback aggregator funnel through UdpProgressFilter::apply — only readings whose (received + lost) denominator is at-least-as-fresh as anything seen are admitted. Atomic CAS via fetch_update so two producers can't race a stale store after a fresh one.
  • Live UDP loss now surfaces in non-TUI output. --no-tui --json-stream / --csv / plain interval output now reflects the freshest udp_progress from either TCP control or UDP feedback (was: per-stream streams[].lost from the most recent TCP Interval only, which under control-channel stalls could be several seconds stale).
  • Docker repro harness for #70 at docker/Dockerfile.repro + docker/repro-issue-70.sh. Multi-stage build with the new branch and the v0.9.13 baseline side-by-side; docker run --rm --cap-add=NET_ADMIN xfr-repro runs hard assertions on the new build, --baseline prints diagnostics for narrative comparison.

Changed

  • TestProgress schema gains udp_feedback_only: bool so consumers distinguish a feedback-only update (only udp_progress carries truth) from a full TCP Interval update. Pre-1.0 break for downstream library users.
  • Server bidir mode no longer emits UDP feedback — feedback is upload-mode-only by design.
  • receive_udp skips feedback packets in bytes_received accounting — feedback is a control-plane sideband, not test-data wire bandwidth.
  • Capability list factored into a single SUPPORTED_CAPABILITIES const with a new capability_advertised() helper centralizing the matcher.

Cross-version compatibility

All pairings work. Wire-additive — no breaking changes:

Pairing Behavior
v0.9.14 ↔ v0.9.14 UDP feedback active; live loss updates smoothly under saturation
v0.9.14 client ↔ ≤v0.9.13 server Server doesn't emit feedback; falls back to TCP udp_progress (exactly v0.9.13 behavior)
≤v0.9.13 client ↔ v0.9.14 server No feedback emission; Skip on the new server's timer applies anyway, so the old client gets cleaner non-bursty intervals
Either ↔ ≤v0.9.10 server udp_progress field absent; falls back to TCP Interval.lost per-interval count

Known limitation

Non-TUI interval row cadence can still bunch under extreme loss. v0.9.14 keeps the TUI live counter and the cumulative loss cache fresh via UDP feedback, but --json-stream / --csv / plain output still print rows on TCP control Interval arrival; under aggressive synthetic loss the kernel can deliver already-sent intervals in bursts. The printed lost value on each row is the freshest cumulative truth — the rows themselves can arrive bunched. Documented in KNOWN_ISSUES.md; tracked as a follow-up in ROADMAP.md.

Library API (pre-1.0 break)

  • client::TestProgress gains udp_feedback_only: bool (constructors must supply it)
  • client::UdpProgressFilter, client::UdpFeedbackAggregator — new public types
  • udp::receive_udp signature gains a trailing feedback_enabled: bool
  • udp::receive_udp_feedback_only(...) — new function
  • udp::UdpFeedbackPacket and UDP_FEEDBACK_* constants exported
  • protocol::SUPPORTED_CAPABILITIES and protocol::capability_advertised exported
  • stats::StreamStats::udp_progress_snapshot() exported

What's Changed

  • UDP receiver feedback (udp_feedback_v1) — issue #70 final fix by @lance0 in #80

Full Changelog: v0.9.13...v0.9.14

Don't miss a new xfr release

NewReleases is sending notifications on new releases.