github sgl-project/sglang v0.4.3

latest releases: v0.4.3.post2, v0.4.3.post1
8 days ago

Highlights

The SGLang team is excited to announce the release of v0.4.3. We will keep improving DeepSeek V3/R1 performance. In the last six weeks, SGLang has been the fastest engine running DeepSeek V3/R1 among all open-source LLM inference engines. We stay ahead by integrating FlashInfer MLA and optimizing further. Look out for new optimizations coming soon! Please feel free to join our Slack channel https://slack.sglang.ai Cheers!

Performance Improvements

DeepSeek V3/R1 Optimizations

  • Pioneering integration of FlashInfer MLA Attention delivers 4x performance improvement for long-context scenarios (Special thanks to the FlashInfer team @yzh119 ) #3550
  • Added torch.compile support for FP8, achieving 50 tokens/s for online inference #3232
  • Implemented CUTLASS block-wise FP8 for enhanced efficiency

Architecture Enhancements

  • Upgraded to FlashInfer v0.2
  • Enabled Flash Attention 3 by default for prefill
  • Extended EAGLE 2 support:
    • Enhanced integration with FlashInfer backend
    • Added support in Triton backend

New Features

  • Introduced Function Calling capabilities
  • Added regex pattern support in XGrammar backend
  • Implemented custom sampling processor for flexible inference control
  • Integrated LoRA support in Triton backend

What's Changed

New Contributors

  • @fsygd made their first contribution in #2596
  • @fzyzcjy made their first contribution in #2565
  • @JamesSand made their first contribution in #2574
  • @yudian0504 made their first contribution in #2521
  • @kzhou003 made their first contribution in #2570
  • @XiaotongJiang made their first contribution in #2652
  • @mobicham made their first contribution in #2669
  • @roG0d made their first contribution in #2707
  • @mickqian made their first contribution in #2714
  • @BruceXcluding made their first contribution in #2601
  • @gaocegege made their first contribution in #2727
  • @libratiger made their first contribution in #2571
  • @observerw made their first contribution in #2745
  • @Edwardf0t1 made their first contribution in #2535
  • @xingyaoww made their first contribution in #2513
  • @jjjjohnson made their first contribution in #2723
  • @minleminzui made their first contribution in #2773
  • @sleepcoo made their first contribution in #2816
  • @Mutinifni made their first contribution in #2819
  • @CatherineSue made their first contribution in #2822
  • @Muqi1029 made their first contribution in #2835
  • @gty111 made their first contribution in #2826
  • @coolhok made their first contribution in #2730
  • @sogalin made their first contribution in #2852
  • @yundai424 made their first contribution in #2821
  • @saienduri made their first contribution in #2927
  • @chunyuan-w made their first contribution in #2806
  • @HermitSun made their first contribution in #2944
  • @giorgiopiatti-dfinity made their first contribution in #2982
  • @seungduk-yanolja made their first contribution in #2839
  • @hongpeng-guo made their first contribution in #2396
  • @lcskrishna made their first contribution in #2995
  • @yiakwy-xpu-ml-framework-team made their first contribution in #3003
  • @josephydu made their first contribution in #2939
  • @sudo-root-ns made their first contribution in #3055
  • @Fridge003 made their first contribution in #3038
  • @simveit made their first contribution in #2742
  • @trevor-m made their first contribution in #3037
  • @yinfan98 made their first contribution in #3130
  • @hubertlu-tw made their first contribution in #3085
  • @YAMY1234 made their first contribution in #2700
  • @jhinpan made their first contribution in #3144
  • @falegh made their first contribution in #3190
  • @ravi03071991 made their first contribution in #3229
  • @whchung made their first contribution in #3255
  • @lycanlancelot made their first contribution in #3205
  • @kushanam made their first contribution in #3272
  • @lizamd made their first contribution in #3356
  • @zstreet87 made their first contribution in #3275
  • @WhatGhost made their first contribution in #3346
  • @Jackmin801 made their first contribution in #3364
  • @didier-durand made their first contribution in #3497

Full Changelog: v0.4.1...v0.4.3

Don't miss a new sglang release

NewReleases is sending notifications on new releases.