What's Changed
- Resolve streaming last token error and correct total token usage by @zenyr in #342
- Fix NameError in loglikelihood_rolling method by @snellingio in #339
- fix error on unsupported response type in server by @emmanuel-ferdman in #344
- Add --trust-remote-code cli option by @dojoteef in #319
- Add validation set for DWQ by @awni in #343
- feat: add --confirm-run-unsafe-code CLI option to allow execution of untrusted code by @ivanfioravanti in #348
- Allow per model quant config by @awni in #349
- Add gpt_oss model by @christian-lms in #354
- Jensen-Shannon divergence loss kernel by @vsabolcec in #352
- Route the gpt_oss to fused sdpa by @angeloskath in #356
- Hunyuan V1 Dense model support by @ivanfioravanti in #351
- Add Additional Features of GPT-OSS Model : Lora, Alternating attention, MoE Support by @Shashikant86 in #357
New Contributors
- @zenyr made their first contribution in #342
- @snellingio made their first contribution in #339
- @emmanuel-ferdman made their first contribution in #344
- @dojoteef made their first contribution in #319
- @vsabolcec made their first contribution in #352
- @Shashikant86 made their first contribution in #357
Full Changelog: v0.26.2...v0.26.3