This release restore functionalities for other quantized MoEs, which was introduced as part of initial DeepSeek V3 support 🙇 .
What's Changed
- [Docs] Document Deepseek V3 support by @simon-mo in #11535
- Update openai_compatible_server.md by @robertgshaw2-neuralmagic in #11536
- [V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling by @WoosukKwon in #11394
- [V1] Fix yapf by @WoosukKwon in #11538
- [CI] Fix broken CI by @robertgshaw2-neuralmagic in #11543
- [misc] fix typing by @youkaichao in #11540
- [V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly by @robertgshaw2-neuralmagic in #11534
- [BugFix] Deepseekv3 broke quantization for all other methods by @robertgshaw2-neuralmagic in #11547
Full Changelog: v0.6.6...v0.6.6.post1