New features
- Initial
async
integrations (#198, #236) thanks to @lucasavila00. - More flexibility with
bos
andeos
tokens (#248) - Intermediate loading for ISQ models on CPU (#229)
- Fixed Phi 3 128k finally, it is fully working now! (#251)
Changelog
- Update README.md by @KPCOFGS in #224
- Fix api_dir_list! and show better error by @EricLBuehler in #225
- Default to
none
when cannot find token by @EricLBuehler in #226 - docs: update ADAPTER_MODELS.md by @eltociear in #227
- Fix debug log timing of first token by @lucasavila00 in #231
- Implement intermediate loading for ISQ on CPU by @EricLBuehler in #229
- Async sampling by @lucasavila00 in #198
- Fix quantized example by @lucasavila00 in #237
- Source bos, eos tokens from generation_config.json by @EricLBuehler in #243
- Sliding window for phi3 by @EricLBuehler in #244
- Fix docker images by @LLukas22 in #249
- Remove forced max seq len for llama models by @EricLBuehler in #250
- Fix Phi3 128k finally: use position ids to switch between short/long scaling by @EricLBuehler in #251
- Update README.md by @criminact in #253
- Async channels by @lucasavila00 in #236
New Contributors
- @KPCOFGS made their first contribution in #224
- @eltociear made their first contribution in #227
- @criminact made their first contribution in #253
Full Changelog: v0.1.0...v0.1.2