What's Changed
- chore: bump llama.cpp to support tool streaming by @p5 in #1438
- Bump to v0.8.5 by @rhatdan in #1439
- fix: update references to Python 3.8 to Python 3.11 by @nathan-weinberg in #1441
- Fix quadlet handling of duplicate options by @olliewalsh in #1442
- fix(gguf_parser): fix big endian model parsing by @taronaeo in #1444
- Choice could be not set and should not be used by @rhatdan in #1447
- fix(run): Ensure 'run' subcommand works with host proxy settings. by @melodyliu1986 in #1430
- Switch default ramalama image build to use VULKAN by @rhatdan in #1449
- make ramalama-client-core send default model to server by @rhatdan in #1450
- fix(gguf_parser): fix memoryerror exception when loading non-native models by @taronaeo in #1452
- Small logging improvements by @almusil in #1455
- feat(model_store): prevent model endianness mismatch on download by @taronaeo in #1454
- Add support for llama-stack by @rhatdan in #1413
- Refactoring huggingface.py and modelscope.py and extract repo_model_base.py by @yeahdongcn in #1456
- Eliminate selinux-policy packages from containers by @rhatdan in #1451
- Snapshot verification by @engelmi in #1458
- Add support for generating kube.yaml and quadlet/kube files for llama… by @rhatdan in #1457
- Bump to v0.9.0 by @rhatdan in #1462
New Contributors
Full Changelog: v0.8.5...v0.9.0