Better OpenAI prompt caching + fixed token reporting

This release has two changes for OpenAI:

Better prompt caching

This release adds proper supports for to OpenAI's Responses API store: true and previous_response_id fields. With this setting, you can ask OpenAI to store the conversation as it happens, and continue from there by passing the previous_response_id in the next request. This means a faster agent and less token usage.

You can read more about it here:

Fixed OpenAI token reporting

We made a fix for the usage.InputTokens field, which was higher than expected. We need to subtract the cached tokens from that value, and from this release it'll be correct.

Keep fantasizing 👻
Charm

Changelog

New!

0c8663f: feat(openai): add responses api store, previous_response_id, and response.id support (#175) (@ibetitsmike)

Fixed

22c3e9a: fix(openai): subtract cached tokens from input tokens to avoid double counting (#176) (@andreynering)

Thoughts? Questions? We love hearing from you. Feel free to reach out on X, Discord, Slack, The Fediverse, Bluesky.

charmbracelet/fantasy v0.14.0 on GitHub