Foundry Local Release Notes: v0.8.101 🚀
✨ New Features
Improve performance for multi-turn conversations on macOS, especially time to first token, with the addition of the continuous decoding feature. Only new tokens are sent to the model instead of the entire conversation. The previous inputs and responses are saved by the model in the KV-cache.
📝 Known issues
When the context length is exhausted (set by the max_length value), instead of showing a warning / error message, an exception is thrown