v5.4.9 — Accurate token counts (streaming dedup + subagent roll-up)
This release fixes the root cause of Token Optimizer's output-token undercount.
The bug
Claude Code writes multiple assistant records per API call during streaming. Each chunk's usage.output_tokens is the cumulative count up to that chunk. Previous versions deduplicated by requestId and kept only the FIRST record per request — discarding the final cumulative total.
Verified across 30 days of local JSONL: 48,595 requestIds had monotonically-increasing output values. Keeping first instead of max discarded ~28M output tokens on a single dashboard's 30-day window.
The fix
_parse_session_jsonl now tracks per-requestId MAX usage and applies totals once after the file loop completes. Same requestId = same API call = single final count. Different requestIds always count as separate calls (previous version could collide records without requestId under the same dedup key).
Secondary fix: subagent output rolled into session totals
Subagent tokens used to land only in model_daily (for the per-model stacked bar). They never made it into session_log.output_tokens, so the dashboard headline derived from that missed ~60% of output on sessions that heavily delegated via the Task tool.
Now collect_sessions aggregates subagent input, output, cache creation into the parent session's totals before writing to session_log.
Measured impact (Alex's 30d dashboard)
| Metric | Pre-v5.4.9 | v5.4.9 | Desktop |
|---|---|---|---|
| Sessions | 746 | 746 | 737 |
| Messages | 105,743 | 115,035 | 107,901 |
| Output tokens | 6.2M | 45.4M | 66.3M |
| Billable tokens | 189M (3x over) | 546M | (n/a — Desktop uses different methodology) |
Output is now within 1.5x of Desktop. Previously it was 10x under.
Honest labeling
The "Total Tokens" card now reads "Billable Tokens — Fresh input + cache writes + output (what your invoice bills at full rate)" with a breakdown line showing each component. We no longer claim false parity with Desktop's opaque "Total tokens" metric; we show what our math actually measures.
Auto-regen hardening
SessionStart's dashboard staleness check now verifies both the version marker AND the v5.4.9 data-shape marker. Future releases with the same version but different data layout will still trigger regen. Read buffer raised from 32KB to 256KB to cover large embedded data blobs.
Claude Code changelog scan (2.1.91-2.1.109)
Reviewed for features overlapping Token Optimizer. Nothing breaking, but noted for the roadmap:
/costnow shows per-model + cache-hit breakdown for subscribers (2.1.92) — we still add savings tracking and optimization guidance- Background
monitorsmanifest key for plugins (2.1.105) — could replace our custom daemon approach in a future release /recapcommand with auto-away summary (2.1.108) — adjacent to our checkpoint enrichment, no conflict
Tests
104 passing, 3 new regression tests:
test_monotonic_streaming_uses_max— ensures MAX-based deduptest_identical_duplicates_dedup— ensures non-streaming dupes still collapsetest_records_without_requestid_do_not_collapse— the counter-key fix
Upgrade
Claude Code auto-updates the plugin on version bump. SessionStart rebuilds the dashboard with v5.4.9 math automatically. If you want to force-refresh right now: python3 measure.py collect --rebuild && python3 measure.py dashboard.