github ruvnet/RuView v1036
Release v1036

latest release: v1037
2 hours ago

Automated release from CI pipeline

Changes:
feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699)

  • chore: stage v0.0.2 artifacts + temperature scalar for build pipeline

Stages count_v1.{safetensors,onnx,temperature,train_results.json}
ahead of the build/sign/upload step. This commit is a momentary
side-effect — the next commit will refresh the per-arch manifests
with the new binary SHAs once ruvultra finishes the cross-build.

The .temperature file holds the calibration scalar from LBFGS over the
held-out conf logits. The Rust cog will read it post-load and divide
conf_logits by it before sigmoid, exactly matching the Python eval.

  • feat(cog-person-count): v0.0.2 — K-fold validated, label smoothing + early stop + temp scale

The v0.0.1 "65.1% but class-1=0%" result was an unlucky temporal split
that let a degenerate "always predict 0" classifier hit eval acc =
class-0 fraction. 5-fold stratified random CV proved the architecture
actually learns ~57.1% class-1 accuracy under fair splits — a real,
modestly useful signal.

v0.0.2 ships a retrained model that:

  • Splits randomly (seed=42) 80/20 instead of temporally — eliminates
    the trailing-window-class-imbalance cheat.
  • Class-balanced sampler (multinomial with replacement, weighted by
    inverse class frequency) — per-batch expected counts are equal
    regardless of dataset distribution.
  • Label smoothing 0.1 on the cross-entropy — reduces confidence
    saturation that drove v0.0.1's all-or-nothing predictions.
  • Early stopping with patience=20 — stops at epoch 29 instead of
    overfitting through 400.
  • Temperature scaling of the conf head — LBFGS fits a scalar T on
    held-out conf logits; ships as a count_v1.temperature sidecar so the
    Rust cog can divide conf_logits by T before sigmoid.

Numbers on the same data:

Metric v0.0.1 v0.0.2 K-fold (5x100)
Overall acc 65.1% 62.3% 62.2% ± 1.9%
Class 0 acc 100% 86.2% 67.4%
Class 1 acc 0% 34.3% 57.1% ✓
MAE 0.349 0.377 0.378
Spearman 0.023 0.013 0.160

Class-1 accuracy 0 → 34.3% is the headline win. Net acc moves slightly
because we stopped cheating on class 0. K-fold's 57% says there's
headroom remaining; reaching it needs more independent splits (== more
data), not more training tricks.

Confidence calibration didn't move. Temperature scaling alone can't fix
a confidence head trained against a noisy argmax==truth indicator over
a 62%-accurate classifier — the head's training signal is the issue,
not its post-hoc transform. The honest fix is multi-room data (#645),
not another calibration knob.

Live on cognitum-v0 at /var/lib/cognitum/apps/person-count/ — health
reports candle-cpu backend, count = 1 (was 0 in v0.0.1) on synthetic
zero input.

Files changed:

  • scripts/train-count.py — adds --k-fold (no sklearn dep, hand-rolled
    stratified splits with deterministic shuffle) and --v2 paths.
  • v2/.../cog/artifacts/count_v1.safetensors (392 KB, new sha
    32996433…) + count_v1.onnx (16 KB) + count_v1.temperature (0.9262
    scalar) + count_train_results.json (full epoch trace).
  • v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json bumped to
    version 0.0.2 with the new weights_sha256 + caveats.
  • docs/benchmarks/person-count-cog.md — appends a v0.0.2 section
    with the K-fold diagnostic table and honest-read paragraph.

GCS:
gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors
refreshed (binaries unchanged — load weights via mmap at runtime).

Docker Image:
ghcr.io/ruvnet/RuView:b3a5012dbd1db43c444b12d1eae367f95c71ba4e

Don't miss a new RuView release

NewReleases is sending notifications on new releases.