github alexgreensh/token-optimizer v5.4.9

latest releases: v5.8.8, v5.8.7, v5.8.6...
one month ago

v5.4.9 — Accurate token counts (streaming dedup + subagent roll-up)

This release fixes the root cause of Token Optimizer's output-token undercount.

The bug

Claude Code writes multiple assistant records per API call during streaming. Each chunk's usage.output_tokens is the cumulative count up to that chunk. Previous versions deduplicated by requestId and kept only the FIRST record per request — discarding the final cumulative total.

Verified across 30 days of local JSONL: 48,595 requestIds had monotonically-increasing output values. Keeping first instead of max discarded ~28M output tokens on a single dashboard's 30-day window.

The fix

_parse_session_jsonl now tracks per-requestId MAX usage and applies totals once after the file loop completes. Same requestId = same API call = single final count. Different requestIds always count as separate calls (previous version could collide records without requestId under the same dedup key).

Secondary fix: subagent output rolled into session totals

Subagent tokens used to land only in model_daily (for the per-model stacked bar). They never made it into session_log.output_tokens, so the dashboard headline derived from that missed ~60% of output on sessions that heavily delegated via the Task tool.

Now collect_sessions aggregates subagent input, output, cache creation into the parent session's totals before writing to session_log.

Measured impact (Alex's 30d dashboard)

Metric Pre-v5.4.9 v5.4.9 Desktop
Sessions 746 746 737
Messages 105,743 115,035 107,901
Output tokens 6.2M 45.4M 66.3M
Billable tokens 189M (3x over) 546M (n/a — Desktop uses different methodology)

Output is now within 1.5x of Desktop. Previously it was 10x under.

Honest labeling

The "Total Tokens" card now reads "Billable Tokens — Fresh input + cache writes + output (what your invoice bills at full rate)" with a breakdown line showing each component. We no longer claim false parity with Desktop's opaque "Total tokens" metric; we show what our math actually measures.

Auto-regen hardening

SessionStart's dashboard staleness check now verifies both the version marker AND the v5.4.9 data-shape marker. Future releases with the same version but different data layout will still trigger regen. Read buffer raised from 32KB to 256KB to cover large embedded data blobs.

Claude Code changelog scan (2.1.91-2.1.109)

Reviewed for features overlapping Token Optimizer. Nothing breaking, but noted for the roadmap:

  • /cost now shows per-model + cache-hit breakdown for subscribers (2.1.92) — we still add savings tracking and optimization guidance
  • Background monitors manifest key for plugins (2.1.105) — could replace our custom daemon approach in a future release
  • /recap command with auto-away summary (2.1.108) — adjacent to our checkpoint enrichment, no conflict

Tests

104 passing, 3 new regression tests:

  • test_monotonic_streaming_uses_max — ensures MAX-based dedup
  • test_identical_duplicates_dedup — ensures non-streaming dupes still collapse
  • test_records_without_requestid_do_not_collapse — the counter-key fix

Upgrade

Claude Code auto-updates the plugin on version bump. SessionStart rebuilds the dashboard with v5.4.9 math automatically. If you want to force-refresh right now: python3 measure.py collect --rebuild && python3 measure.py dashboard.

Don't miss a new token-optimizer release

NewReleases is sending notifications on new releases.