[0.18.0] - 2026-04-26
Fixed
- Reasoning-token pricing semantics (#32). Two correctness bugs that distorted reported spend whenever reasoning tokens were involved:
- Codex
usage.reasoningwas double-billed at the output rate even though Codex'soutput_tokensalready includes reasoning.burnnow treats Codex turns asincluded_in_outputand billsoutputonly. On a 10-turn Codex sample (660k input / 53k output / 29k reasoning / 5.6M cacheRead), this drops the reported cost from $4.282607 to $3.846557 — about 11% off the Codex slice. cost.reasoningfrom themodels.devsnapshot was discarded duringflatten(), so any model with a distinct reasoning tariff (e.g. Alibaba Qwen reasoning models) couldn't be priced correctly. The flattener now preservesreasoningand tags the entryreasoningMode: 'separate';costForUsagehonors the distinct tariff.
- Codex
- Waste-attribution session totals now honor the same reasoning-mode semantics as
costForTurn.attributeWastepreviously had a privatecostForTurnLocalthat unconditionally billed reasoning at the output rate, which double-billed Codex turns and ignored separate reasoning tariffs insessionGrand/grandCost/unattributedCost. It now delegates tocostForTurn, so waste totals matchcost.tsfor any session involving reasoning tokens (Devin review on #73).
Added
ModelCost.reasoningMode: 'included_in_output' | 'separate' | 'same_as_output'and optionalreasoningper-million tariff.ReasoningModeandCostForUsageOptionsare exported.costForUsage(usage, model, pricing, { reasoningMode })accepts an explicit override.costForTurninfersincluded_in_outputforsource: 'codex'automatically.flattenis now exported so callers can buildPricingTables from in-memorymodels.devpayloads.