Introduces monitor guided inference and onnxruntime-genai support. Numerous fixes.
Added
Support for Python 3.14 (#1385)
- Ability to add inference-time semantic verification via monitor guided inference (#1391) by @naga86
- Added onnxruntime-genai backend (#1366)
Removed
- Dropped 3.9 support
Fixed
- Remove code dealing with deprecated transformers.HybridCache (#1408)
- Ensure setup.py has consistent transformers and llama-cpp-python versions (#1405)
- Use chat template native functionality (#1387)
- Various build fixes and migrations
- Fix VLLM extra body fix due to format changing (#1405) by @parkervg
- Asyncio event loop fix for Python 3.14+ (#1394) by @ngoldbaum
- Fix multi-line string indents (indent=True) for python 3.14+ (#1395)