github iopsystems/rezolus v5.15.0

8 hours ago

Added

  • Agent: resilient BPF attach and sampler status. The BPF builder now
    attaches each program individually and tolerates per-program failures
    (load/verify failures stay fatal), so a single dead probe no longer
    takes down its sibling programs. A new GET /samplers endpoint
    reports each sampler as active/disabled/failed with per-program
    attach detail, and recordings capture this status under
    per_source_metadata.<source>.sampler_status. (#954)
  • GPU (NVIDIA): gpu_tensor_utilization now breaks out per-tensor-pipe
    activity via a pipe label — hmma (FP16/BF16, and FP32 matmul that
    runs as TF32), imma (integer), and dfma (FP64) — alongside the
    existing aggregate (pipe=any). Collected from NVML GPM, so it
    requires Hopper+ and is reported only where the corresponding pipe is
    supported. (#946)

Fixed

  • BPF samplers that rely on in-kernel BTF (cpu_usage, cpu_migrations,
    cpu_perf, scheduler_runqueue, syscall_counts) now work on kernels
    built without /sys/kernel/btf/vmlinux (e.g. NVIDIA Tegra/L4T). Each
    tp_btf hook gains a raw_tp twin selected at runtime via
    kernel_has_btf(), and syscall_counts uses bpf_get_current_task()
    instead of bpf_get_current_task_btf(). CO-RE still uses the external
    BTF file (btf_path). Stock BTF kernels are unaffected. (#948)
  • BPF sampler correctness fixes from a full review against
    docs/principles.md: histogram bucketing used 32-bit shifts,
    mis-bucketing values ≥ 2³¹ (long-tail latencies ≥ ~2.15 s were
    misreported); blockio latency tracking silently dropped all requests
    on kernels < 5.11 due to a tracepoint argument layout difference;
    scheduler/runqueue could charge runqueue-wait and off-cpu time to the
    wrong cgroup; a full ringbuf no longer permanently suppresses a
    cgroup's name; tcp_retransmit now counts segments instead of calls
    (it undercounted with TSO/GSO); plus smaller metadata, histogram, and
    defensive-check fixes. (#956)

Don't miss a new rezolus release

NewReleases is sending notifications on new releases.