Guidance 0.2.3
We have a performance hotfix, and then we snuck in some extras.
Added
- Added Llama3.2 chat template
Removed
- Deleted some dead code, in particular sample_with_temperature from Engine classes
Changed
- Switched top-k (for widget) implementation to use a priority-queue instead of a full sort, saving a few milliseconds per token when widget/vis is turned on
Fixed
- Fix performance regression introduced in issue #1261: full logits history no longer cached, and fast-forwarded token probabilities are now only available (in widget) the first time they are added to the KV cache and will be missing otherwise.