Highlights

The SGLang team is excited to the release of v0.4.5! This version introduces several significant features, including Llama 4 support, FlashAttention 3 backend, EAGLE3 speculative decoding, DeepEP integration, and disaggregated prefill and decoding.

New Features

Llama 4 Support: We supported Llama 4 model with accuracy matching official benchmark numbers, achieving a zero-shot score of 75.2 on the MMLU Pro dataset for Llama-4-Scout-17B-16E-Instruct model and 80.7 for Llama-4-Maverick-17B-128E-Instruct model. #5092
FlashAttention 3 Backend: Our implementation of the FlashAttention 3 backend delivers significant acceleration for long-context tasks. #4709
EAGLE3 Speculative Decoding: We’re proud to be the first to support EAGLE3 speculative decoding, offering substantial gains in decoding throughput. Learn more in our documentation and the EAGLE3 paper. #4247
DeepEP Integration: By incorporating DeepEP, we enhanced performance for MoE inference.
Disaggregated Prefill and Decoding: We introduced a prototype for disaggregated prefill and decoding, with plans for further optimizations.

Thanks very much to the NVIDIA team, LinkedIn team, EAGLE team, Oracle team, Meituan team, and our incredible open-source community for their invaluable contributions!

Coming Soon

Disaggregated Prefill and Decoding: #4655
Llama 4 Optimization: #5118
EP Enhancement: #4734
FA3 Enhancement: #4709

We’re thrilled about these advancements and eager to hear your feedback! Join us on our Slack channel at slack.sglang.ai to connect and share your thoughts. Cheers!

What's Changed

Fix a regression introduced by overlapping KV cache writing by @merrymercy in #4375
Update ci_install_dependency.sh to use accelerate 1.4.0 by @merrymercy in #4392
Improve DP attention by @merrymercy in #4390
Fix auto merge & add back get_flat_data_by_layer by @merrymercy in #4393
Add some fused elementwise kernels for grok-1 by @merrymercy in #4398
Fix Llama3.3 tool call support by @CatherineSue in #4320
Fix the output of hidden states after HTTP requests by @Qiaolin-Yu in #4269
Add a dummy grok test case by @merrymercy in #4399
Hot fix for hicache with new page aligned radixtree by @xiezhq-hermann in #4397
bump v0.4.4.post1 by @zhyncs in #4402
Update CODEOWNERS by @merrymercy in #4403
Hierarchical Caching supports MLA by @zeroorhero in #4009
cleanup deps 1/n by @zhyncs in #4400
feat(remote_model): support variable remote backend for model loader by @DellCurry in #3964
[bug] fix duplicate variable MAX_PIXELS in qwen_vl.py by @qibaoyuan in #4419
[Doc] fix wrong flag in deepseek documentation by @lausannel in #4427
Add moe topk softmax templated from vllm by @qingquansong in #4302
bump v0.0.5.post1 by @zhyncs in #4437
Fix maximum recursion depth triggered on exception exit by @merrymercy in #4438
use topk_softmax with sgl-kernel by @zhyncs in #4439
docs: hot fix torch compile cache by @zhaochenyang20 in #4442
ci: update transformers==4.48.3 by @mickqian in #4451
Fix test_create_kvindices unit test by @sleepcoo in #4452
[Fix] Fix errors when using the device except cuda. by @cboss6 in #4455
docs: Add Llama 3.3 to supported models by @JiangJiaWei1103 in #4453
Update bench_serving.py by @xu-song in #4454
bugfix: Update sampling_params.py by @WrRan in #4413
typos: Update sampling_params.md by @WrRan in #4391
Auto-detect device if not specified in server arguments. by @vshekhawat-hlab in #4423
Add support for upcoming QwenMoe by @michaelfeil in #4447
perf: update fused moe config by @mickqian in #4459
typos by @WrRan in #4368
Fix minor style by @merrymercy in #4460
cleanup deps 2/n by @zhyncs in #4464
feat: Add FlashMLA submodule by @shuaills in #4449
[Fix] use torch.cat instead of torch.concat to prevent entering the Autograd backends. by @Alcanderian in #4466
Fix finish step for pr tests and notebook tests by @merrymercy in #4467
Remove filter for pr-tests by @merrymercy in #4468
Add greedy verification kernel by @Ying1123 in #4383
Release sgl-kernel v0.0.5.post2 by @merrymercy in #4469
Revert "feat: Add FlashMLA submodule (#4449)" by @zhyncs in #4470
[Eagle] Remove the greedy branch and some redundant code by @Ying1123 in #4363
Support FlashMLA backend by @sleepcoo in #4472
fix custom allreduce performance/accuracy problem by @yizhang2077 in #4477
400 on empty input_ids by @yinghai in #4481
Update CODEOWNERS by @merrymercy in #4484
Statistical Analysis of the Output Stability of the Deepseek Model by @tanzelin430 in #4202
model: support gemma-3-it by @mickqian in #4424
Initialize image processor for skip-tokenizer-init codepath by @yinghai in #4479
Fix: modelscope env comment by @huiwq1990 in #4474
Fix: Complete int32 to int64 conversion by @xiezhq-hermann in #4465
[ROCm] enable moe topk softmax in amd by @yiakwy-xpu-ml-framework-team in #4448
Feat/support code completion by @woodx9 in #3612
Add endpoint for file support, purely to speed up processing of input_embeds. by @RinRin-32 in #2797
Set xgrammar as the default grammar backend by @minleminzui in #4386
Fix router test by @ByronHsu in #4483
[Fix] use torch.inference_mode() instead of torch.no_grad() by @Alcanderian in #4372
[Feature] Support Deepseek-VL2 by @ccw1996 in #2798
config: Update fused moe config by @mickqian in #4493
Support serving DeepSeek-R1-Channel-INT8 with 32 L40S. by @solrex in #4418
Support Online Quantization for W8A8 by @hebiao064 in #4485
Tool call with text by @xihuai18 in #4067
Nicer standalone engine inferface by @yinghai in #4480
[Fix] Resolve GPU Memory Leak in update_weights_from_tensor by @U-rara in #4446
[Doc] add doc for quantization w8a8_fp8 or w8a8_int8 by @HandH1998 in #4495
Fix data parallel + tensor parallel by @merrymercy in #4499
[ROCm] fix dtype by @yiakwy-xpu-ml-framework-team in #4510
Remove redundant type conversion by @merrymercy in #4513
Update readme by @merrymercy in #4517
[sgl-router] improvement to avoid hang by @yinghai in #4482
Revert "feat: update grouped_topk to support softmax and sigmoid" by @ispobock in #4505
bump v0.0.5.post3 by @zhyncs in #4520
upgrade sgl-kernel 0.0.5.post3 by @zhyncs in #4522
sglang quant module remove vllm dependency by @BBuf in #4507
Unit test for Hierarchical Caching by @xiezhq-hermann in #4486
refactor: rewrite bench-mmmu-sglang by @mickqian in #4458
fix: second_per_grid_ts should be used to get mrope position by @mickqian in #3682
[Hotfix] solve fp8 w8a8 ci test fail by @BBuf in #4531
remove useless backend forward in rotary_embedding by @BBuf in #4500
Fix the incorrect args in benchmark_and_profiling.md by @tianyuzhou95 in #4542
cleanup deps 3/n by @zhyncs in #4541
Add deepseek v2 torch compile pr test by @ispobock in #4538
use sgl custom all reduce by @zhyncs in #4441
[Fix] Type annotation correction for UpdateWeightsFromTensorReqInput by @U-rara in #4532
[Feature] Support EAGLE 3 by @chromecast56 in #4247
Reduce computation and communication in DP attention by @ch-wan in #4521
[Feature] Support Tensor Parallelism and Weight Slicing for Lora by @aoshen524 in #4274
Optimize Triton decoding kernel for dynamic workload by @Alcanderian in #4553
[Fix] Fix raw_bs bug when using flashinfer mla and eagle by @Fridge003 in #4557
Create col-major and tma-aligned x_scale for deep_gemm.gemm_fp8_fp8_bf16_nt by @strgrb in #4515
[Feature] Integrate DeepEP into SGLang by @liz-badada in #4232
Support FlashMLA backend cuda graph by @sleepcoo in #4514
Add clang-format to pre-commit config by @Hongbosherlock in #4583
[fix] fix initialization of _ENABLE_TORCH_INFERENCE_MODE by @Alcanderian in #4549
avoid cudaStreamSynchronize in DeepSeekV2AttentionMLA by @strgrb in #4577
Support n in OpenAI API completions by @ChuyueSun in #3446
[fix] fix illegal mem access and clean up triton attention backend by @Alcanderian in #4571
Enable setting sglang logger from Env Variable SGLANG_LOGGING_CONFIG_PATH by @guoyuhong in #4592
Update doc for MTP and DP attention by @ispobock in #4622
Support fp8 gemm for blackwell by @wenscarl in #4558
fix SUPPORT_CUTLASS_BLOCK_FP8 flag by @ch-wan in #4640
Set deepgemm to the default value in the hopper architecture. by @sleepcoo in #4613
[docs] Add links and fix grammars in deploy_on_k8s.md by @windsonsea in #4641
Align completion and chat_completion response to OpenAI API by @guoyuhong in #4637
[PD] Release initial code by @ByronHsu in #4654
fix: fix ipython running error for Engine due to outlines nest_asyncio by @minleminzui in #4582
update news for README by @zhyncs in #4664
Speed up per token and per tensor quant by 15% by @zcnrex in #4639
[quantization] fix channelwise conversion with scalar weight scale by @yundai424 in #4596
Correcting default configuration when benchmarking fused_moe by @penguin-wwy in #4665
[1/3] fix dsv3 awq issue by @AniZpZ in #4556
[Docs] Update docs for gemma3 and VLM chat templates by @adarshxs in #4674
[CI fix] test skipping modelopt on AMD by @adarshxs in #4677
fix flaky ut by @zhyncs in #4670
Add EAGLE mtbench benchmark script by @ispobock in #4676
Bug fix for metrics counter by @xiezhq-hermann in #4660
[Bug Fix] Add partial rotary factor support for Phi-4 and upgrade to transformers v4.50.0 by @adarshxs in #3984
Optimize Permute Kernel in DeepEP by @xutizhou in #4643
fix typo SGLang supports three grammar backends by @BroadbentJim in #4679
close gemma2 in test_verl_engine.py temporarily by @yizhang2077 in #4685
Multiple tiny code cleanups by @fzyzcjy in #4608
Support async in DeepEP by @fzyzcjy in #4610
refactor: bug fixes and refactor for vlm by @mickqian in #4661
Move mem_state update into debug mode by @xiezhq-hermann in #4525
Fix RotaryEmbedding when using Triton backend for EXAONE-3.5-2.4B by @lkm2835 in #4064
Unify variable naming: replace is_in_free_group with is_not_in_free_group by @c1lovez1 in #4698
[ROCm] Enable MTP (NextN) on AMD GPU by @alexsun07 in #4631
Support FA3 as Attention backend by using --attention-backend fa3 by @hebiao064 in #4680
rename benchmark_deepgemm_fp8_group_gemm.py by @tbzhang in #4605
[Quant Kernel] refactored per token group quant fp8 to support int8 up-to 2x faster by @zcnrex in #4396
Support dynamic version name in sglang's pyproject.toml by @guoyuhong in #4720
update pyproject by @zhyncs in #4731
[PD] Remove invalid parameter by @XucSh in #4721
Fix EAGLE3 for llama3.3 70b by @ispobock in #4716
Fix circular imports in gptq.py and unblock test explorer by @hebiao064 in #4736
[Model] Support Qwen2ForSequenceClassification by @Ximingwang-09 in #4609
Support FP4 gemm (1/2) by @trevor-m in #3899
Add DeepEP tests into CI by @fzyzcjy in #4737
model: Minicpmo by @mickqian in #3023
support cu128 sgl-kernel by @zhyncs in #4744
[Benchmark] tilelang vs deepgemm vs w8a8_block_fp8_matmul by @zcnrex in #4735
Super tiny fix typo by @fzyzcjy in #4738
fix FlashMLA cudagraph config by @sleepcoo in #4691
Speedup warmup when DP > 1 by @fzyzcjy in #4695
Add endpoints to dump selected expert ids by @yuhsuan-t in #4435
add dsv3 int8 test by @HandH1998 in #4705
[Feature] Support "strict" in function calling by @DarkSharpness in #4310
Revert "Add DeepEP tests into CI (#4737)" by @fzyzcjy in #4751
Fix test_expert_distribution failure by @fzyzcjy in #4752
Fix warmup error when dp=1 by @fzyzcjy in #4753
Add retry for flaky tests in CI by @fzyzcjy in #4755
[Fix] Fix unexpected idx bug of Phi-3-small by @Fridge003 in #4728
Warn users when release_memory_occupation is called without memory saver enabled by @fzyzcjy in #4566
fix(typo): fix reply to replay in base_attn_backend.py by @Thysrael in #4784
Support recording experts workload in QWen2-MoE by @ch-wan in #4775
Fix popen_launch_server wait for 20 minutes when child process exits by @fzyzcjy in #4777
Use metadata to detect version of package by @kebe7jun in #4782
Fix shared memory OOM on sm86 GPUs. by @Conless in #4797
Support compressed tensors fp8w8a8 by @BBuf in #4743
bump v0.4.4.post2 by @zhyncs in #4669
[3/3] fix dsv3 awq issue by @laixinn in #4719
Update supported_models.md: adding open-r1 Olympic Code 32B by HuggingFace by @didier-durand in #4628
Align finish reason and stream mode in openai api by @xihuai18 in #4388
support clip embedding model by @Titan-p in #4506
update xgrammar 0.1.17 by @zhyncs in #4804
Patch PyTorch's bug that cross-process tensor transfer will lead to wrong device by @fzyzcjy in #4565
[FA3 Attn Backend] Remove Unnecessary Device Sync for FA3 by @hebiao064 in #4745
support cmake for sgl-kernel by @zhyncs in #4706
Use apply_rope_with_cos_sin_cache_inplace for DeepSeek by @strgrb in #4764
Fix ut mla-test-1-gpu-amd by @strgrb in #4813
Remove Unintended Capture Batch Sizes in AMD HIP Graph Runner by @gmlwns2000 in #4638
[k8s] Clarified the usage of shared memory. by @jsuchome in #4341
gemma3: impl get_attention_sliding_window_size for attn init by @vhain in #4823
add partial_json_parser and einops by @zhyncs in #4827
fix the release doc dependency issue by @zhyncs in #4828
Update doc for DeepSeek-V3-0324 by @ispobock in #4825
deps: lazy import optional dependencies gguf and torchvision by @vhain in #4826
Update MMMU Benchmark instructions by @ravi03071991 in #4694
Fix the nightly eval by lowering the threshold of neuralmagic/gemma-2-2b-it-FP8 by @merrymercy in #4830
Basic Cleanup by @danielholanda in #4833
Support (1 <= dp < tp) in the dp attention in DeepEP by @tarinkk in #4770
[Fix] Add compressed_tensors as deps by @ocss884 in #4819
Fix error due to CustomAllreduce setup failure by @kebe7jun in #4815
use default for torch.ops by @zhyncs in #4835
[CI] Remove unused imports with Ruff to pre-commit config, only to benchmarks/docs/examples folder by @b8zhong in #3969
[Misc] Fix issues reported by torchfix by @b8zhong in #4837
Include context length in /v1/models response. by @jondurbin in #4809
[Fix] self.worker assignment in TpModelWorker and refactor references by @JustinTong0323 in #4788
Fix the lora adapter when lora path is none by @Qiaolin-Yu in #4799
fix: fix typo of comments in w8a8_fp8.py by @ZhuJiaqi9905 in #4843
Remove retry in nightly tests by @fzyzcjy in #4846
Fix CI of test_patch_torch by @fzyzcjy in #4844
IPv6 support by @vincent-4 in #3949
ci: add condition for daily docker build by @warjiang in #4487
[Fix] fix output_top_logprobs is not exist by @lambert0312 in #4597
fix: when use SGLANG_PORT this env,port is str by @lengrongfu in #4528
Support Page Size > 1 for FA3 by @hebiao064 in #4832
Fix Engine error when enabling DP attention by @fzyzcjy in #4648
fix: Inappropriate lack of Optional type on OpenAI ChatCompletionRequest by @BroadbentJim in #4681
Support controlling nsys start and end range programmatically by @fzyzcjy in #4688
Remove empty tool function name by @kebe7jun in #4704
Fix missing arguments in SchedulePolicy and RadixCache initialization in tests. by @vshekhawat-hlab in #4712
get the python version from env by @DavidChan0519 in #4729
Fix torch.cuda.MemPool() internal assertion failure by @fzyzcjy in #4687
Super tiny remove unused code by @fzyzcjy in #4750
Support with_stack and record_shapes in profiler by @fzyzcjy in #4740
test: reduce mem_fraction_static for gemma3 vision test by @vhain in #4840
Fix CI tests by @merrymercy in #4853
Fix fa3 cuda graph page_size > 1 precision and page_size=1 speed by @qingquansong in #4855
Revert "get the python version from env (#4729)" by @zhyncs in #4863
[Feature] add multi-rank support for Lora by @jcbjcbjc in #4492
Clean up import vllm in quantization/init.py by @merrymercy in #4834
Fix wrong variable name when stopping memory profile by @Fr4nk1inCs in #4772
[Feat] support deepgemm for cmake by @yinfan98 in #4864
Make torch compile configurable for biased_grouped_topk by @qingquansong in #4749
update sgl-kernel test ci by @zhyncs in #4866
fix sampling issue by @zhyncs in #4871
bump sgl-kernel 0.0.5.post4 by @zhyncs in #4768
fix sgl-kernel cu118 build by @zhyncs in #4872
[Feature] Support FA3 backend for MLA by @Fridge003 in #4831
upgrade sgl-kernel 0.0.5.post4 by @zhyncs in #4873
update torch compile doc by @ispobock in #4874
bump v0.4.4.post3 by @zhyncs in #4878
Fix BadRequestError wrong arguments and remove openai dependency by @fzyzcjy in #4882
Improve stack trace of retry errors by @fzyzcjy in #4845
Tiny fix doc error by @fzyzcjy in #4795
[Docs] Update DeepGemm at README.md by @yinfan98 in #4886
Update CODEOWNERS by @zhyncs in #4889
Delete test_deep_gemm.py by @yinfan98 in #4891
Add deepseek style fused moe group gate selection kernel by @qingquansong in #4530
quick fix: add default for new kernel by @yinfan98 in #4898
remove setup for sgl-kernel by @zhyncs in #4899
[Misc] Clean m.def and add Development Tips by @yinfan98 in #4890
fix allreduce test by @yizhang2077 in #4909
Support page size > 1 + eagle by @merrymercy in #4908
Fix retract for page size > 1 by @merrymercy in #4914
[Feature] use pytest for sgl-kernel by @adarshxs in #4896
fix bmm fp8 by @zhyncs in #4926
Fix the timeout for unit-test-2-gpu in pr-test.yml by @merrymercy in #4927
Fix 2-gpu CI test and suppress some warnings by @merrymercy in #4930
[feat] add fa3 in sgl-kernel by @yinfan98 in #4902
Fix sglang frontend's incorrect dependency on torch by @seplos in #4931
[Fix] avoid stream sync and torch compile in prefill for fa3 backend by @Fridge003 in #4932
cleanup sgl-kernel by @zhyncs in #4933
[Fix] Improve Lora tests and reduce CI runtime by @Fridge003 in #4925
Fix DeepSeek bug causing 2.2% MMLU drop when TP!=DP by @fzyzcjy in #4883
[Fix] Add torch compile for torch.clamp back by @Fridge003 in #4936
Fix oom error for large page size by @xiezhq-hermann in #4913
[feat] interface for platforms abstraction by @Alcanderian in #4928
[Fix] revert clean m.def for cudagraph by @yinfan98 in #4944
refactor: multimodal data by @mickqian in #4754
bump sgl-kernel v0.0.6 by @zhyncs in #4950
[Build] Fix cuda12.8 build error in nvfp4_scaled_mm_kernels.cu by @guoyuhong in #4953
use fa3 in sgl-kernel by @zhyncs in #4954
Revert PR 4764 & 4813 related to R1 RoPE by @guoyuhong in #4959
[Feature] Support DeepEP Low Latency by @liz-badada in #4767
update bench_serving by @zhyncs in #4958
Prevent memory leak of retract_decode when page_size > 1 by @xiezhq-hermann in #4977
[VLM RLHF] Take Image input for verl vlm rollout by @JustinTong0323 in #4915
Large page size aligned hierarchical caching by @xiezhq-hermann in #4581
bug fix for hicache host eviction by @xiezhq-hermann in #4989
sgl scaled_fp8_quant support output padding by @BBuf in #4861
Add Eagle Speculative Decoding to FA3 Backend by @qingquansong in #4951
Update tokenizer_manager.py by @yangky11 in #5008
[sgl-kernel] per token group quant support COLUMN MAJOR by @BBuf in #4817
update cutlass tag by @xiezhq-hermann in #5011
Feature/revise docs ci by @renxinx in #5009
fix: fix illegal cuda memory access at fused_moe_kernel by @saltyfish66 in #4727
[Build] Support build sgl-kernel with ccache by @guoyuhong in #5020
fix deepgemm as well by @xiezhq-hermann in #5030
try to fix ci oserror by @BBuf in #5024
Replace enable_flashinfer_mla argument with attention_backend by @Fridge003 in #5005
Small refactor DeepEPMode to clean up code a bit by @fzyzcjy in #4992
[Fix] fix fa3 build at cu118 by @yinfan98 in #5036
Revert "Replace enable_flashinfer_mla argument with attention_backend" by @merrymercy in #5048
bump sgl-kernel v0.0.7 by @zhyncs in #5046
update eagle-3 docs by @simveit in #4796
Add LlavaLlamaForCausaLM in MultiModal Processors by @ravi03071991 in #5039
Update the retry count by @zhyncs in #5051
upgrade sgl-kernel v0.0.7 by @zhyncs in #5049
[2/3] fix dsv3 awq issue by @AniZpZ in #4625
Feature/revise docs ci by @renxinx in #5056
Add H20 fused MoE kernel tuning configs for DeepSeek V3/R1 by @M0gician in #5057
[fix] remove cuda_device_count_stateless by @Alcanderian in #5060
Small refactor DeepEPDispatcher into subclasses by @fzyzcjy in #4994
Support async DeepEP by splitting into two stages by @fzyzcjy in #4995
Cleanup unused resources after DeepEP operation by @fzyzcjy in #4996
Add DeepSeek V3/R1 shared experts fusion by @BBuf in #4918
[deepep] fix: shared experts are not initialized when shared experts fusion is disabled by @ch-wan in #5072
fix dummy-load deepseekv2 by @inkcherry in #4535
support sgl-kernel on blackwell by @zhyncs in #5074
FA3 Spec Decoding to support top k = 1 and add cuda graph support by @hebiao064 in #5050
[Revision] Replace enable_flashinfer_mla argument with attention_backend by @Fridge003 in #5052
upgrade transformers 4.51.0 by @zhyncs in #5088
sgl-kernel transfer custom allreduce from trt kernel to vllm kernel by @yizhang2077 in #5079
bump sgl-kernel 0.0.8 by @zhyncs in #5089
python transfer custom allreduce from trt kernel to vllm kernel by @yizhang2077 in #5080
bump v0.4.4.post4 by @zhyncs in #5091
Fix: Reduce the number of document ci attempts to avoid long ci running by @minleminzui in #5097
Add Llama4 support by @CatherineSue in #5092
Fix refactor error - fp8.py by @HaiShaw in #5106
bump v0.4.5 by @zhyncs in #5117

New Contributors

@DellCurry made their first contribution in #3964
@lausannel made their first contribution in #4427
@JiangJiaWei1103 made their first contribution in #4453
@xu-song made their first contribution in #4454
@yinghai made their first contribution in #4481
@tanzelin430 made their first contribution in #4202
@huiwq1990 made their first contribution in #4474
@woodx9 made their first contribution in #3612
@ccw1996 made their first contribution in #2798
@solrex made their first contribution in #4418
@U-rara made their first contribution in #4446
@tianyuzhou95 made their first contribution in #4542
@chromecast56 made their first contribution in #4247
@strgrb made their first contribution in #4515
@liz-badada made their first contribution in #4232
@Hongbosherlock made their first contribution in #4583
@guoyuhong made their first contribution in #4592
@wenscarl made their first contribution in #4558
@penguin-wwy made their first contribution in #4665
@xutizhou made their first contribution in #4643
@BroadbentJim made their first contribution in #4679
@lkm2835 made their first contribution in #4064
@c1lovez1 made their first contribution in #4698
@alexsun07 made their first contribution in #4631
@tbzhang made their first contribution in #4605
@XucSh made their first contribution in #4721
@yuhsuan-t made their first contribution in #4435
@Thysrael made their first contribution in #4784
@Conless made their first contribution in #4797
@gmlwns2000 made their first contribution in #4638
@jsuchome made their first contribution in #4341
@danielholanda made their first contribution in #4833
@tarinkk made their first contribution in #4770
@ocss884 made their first contribution in #4819
@b8zhong made their first contribution in #3969
@jondurbin made their first contribution in #4809
@JustinTong0323 made their first contribution in #4788
@ZhuJiaqi9905 made their first contribution in #4843
@vincent-4 made their first contribution in #3949
@warjiang made their first contribution in #4487
@lengrongfu made their first contribution in #4528
@jcbjcbjc made their first contribution in #4492
@Fr4nk1inCs made their first contribution in #4772
@seplos made their first contribution in #4931
@yangky11 made their first contribution in #5008
@renxinx made their first contribution in #5009
@saltyfish66 made their first contribution in #4727
@inkcherry made their first contribution in #4535

Full Changelog: v0.4.4...v0.4.5

sgl-project/sglang v0.4.5 Release v0.4.5 on GitHub

Highlights

New Features

Coming Soon

What's Changed

New Contributors

sgl-project/sglang v0.4.5
Release v0.4.5

on GitHub