vllm 0.8.3 on Python PyPI

Highlights

This release features 260 commits, 109 contributors, 38 new contributors.

We are excited to announce Day 0 Support for Llama 4 Scout and Maverick (#16104). Please see our blog for detailed user guide.
- Please note that Llama4 is only supported in V1 engine only for now.
V1 engine now supports native sliding window attention (#14097) with the hybrid memory allocator.

Cluster Scale Serving

Single node data parallel with API server support (#13923)
Multi-node offline DP+EP example (#15484)
Expert parallelism enhancements
- CUTLASS grouped gemm fp8 MoE kernel (#13972)
- Fused experts refactor (#15914)
- Fp8 Channelwise Dynamic Per Token GroupedGEMM (#15587)
- Adding support for fp8 gemm layer input in fp8 (#14578)
- Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Support XpYd disaggregated prefill with MooncakeStore (#12957)

Model Supports

Llama 4 (#16104), Aya Vision (#15441), MiniMaxText01(#13454), Skywork-R1V (#15397), jina-reranker-v2 (#15876)
Add Reasoning Parser for Granite Models (#14202)
Add Phi-4-mini function calling support (#14886)

V1 Engine

Collective RPC (#15444)
Faster top-k only implementation (#15478)
BitsAndBytes support (#15611)
Speculative Decoding: metrics (#15151), Eagle Proposer (#15729), n-gram interface update (#15750), EAGLE Architecture with Proper RMS Norms (#14990)

Features

API

Support Enum for xgrammar based structured output in V1. (#15594, #15757)
A new tags parameter for wake_up (#15500)
V1 LoRA support CPU offload (#15843)
Prefix caching support: FIPS enabled machines with MD5 hashing (#15299), SHA256 as alternative hashing algorithm (#15297)
Addition of http service metrics (#15657)

Performance

LoRA Scheduler optimization bridging V1 and V0 performance (#15422).

Hardwares

AMD:
- Add custom allreduce support for ROCM (#14125)
- Quark quantization documentation (#15861)
- AITER integration: int8 scaled gemm kernel (#15433), fused moe (#14967)
- Paged attention for V1 (#15720)
CPU:
- CPU MLA (#14744)
TPU
- Improve Memory Usage Estimation (#15671)
- Optimize the all-reduce performance (#15903)
- Support sliding window and logit soft capping in the paged attention kernel. (#15732)
- TPU-optimized top-p implementation (avoids scattering). (#15736)

Doc, Build, Ecosystem

V1 user guide update: fp8 kv cache support (#15585), multi-modality (#15460)
Recommend developing with Python 3.12 in developer guide (#15811)
Clean up: move dockerfiles into their own directory (#14549)
Add minimum version for huggingface_hub to enable Xet downloads (#15873)
TPU CI: Add basic perf regression test (#15414)

What's Changed

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 by @houseroad in #15160
[Hardware][TPU][Bugfix] Fix v1 mp profiler by @lsy323 in #15409
[Kernel][CPU] CPU MLA by @gau-nernst in #14744
Dockerfile.ppc64le changes to move to UBI by @Shafi-Hussain in #15402
[Misc] Clean up MiniCPM-V/O code by @DarkLight1337 in #15337
[Misc] Remove redundant num_embeds by @DarkLight1337 in #15443
[Doc] Update V1 user guide for multi-modality by @DarkLight1337 in #15460
[Kernel] Fix conflicting macro names for gguf kernels by @SzymonOzog in #15456
[bugfix] fix inductor cache on max_position_embeddings by @youkaichao in #15436
[CI/Build] Add tests for the V1 tpu_model_runner. by @yarongmu-google in #14843
[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) by @oteroantoniogom in #15471
[bugfix] add supports_v1 platform interface by @joerunde in #15417
Add workaround for shared field_names in pydantic model class by @maxdebayser in #13925
[TPU][V1] Fix Sampler recompilation by @NickLucche in #15309
[V1][Minor] Use SchedulerInterface type for engine scheduler field by @njhill in #15499
[V1] Support long_prefill_token_threshold in v1 scheduler by @houseroad in #15419
[core] add bucket padding to tpu_model_runner by @Chenyaaang in #14995
[Core] LoRA: V1 Scheduler optimization by @varun-sundar-rabindranath in #15422
[CI/Build] LoRA: Delete long context tests by @varun-sundar-rabindranath in #15503
Transformers backend already supports V1 by @hmellor in #15463
[Model] Support multi-image for Molmo by @DarkLight1337 in #15438
[Misc] Warn about v0 in benchmark_paged_attn.py by @tlrmchlsmth in #15495
[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) by @LucasWilkinson in #15492
[misc] LoRA - Skip LoRA kernels when not required by @varun-sundar-rabindranath in #15152
Fix raw_request extraction in load_aware_call decorator by @daniel-salib in #15382
[Feature] Enhance EAGLE Architecture with Proper RMS Norms by @luyuzhe111 in #14990
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER by @vllmellm in #14967
[Misc] Enhance warning information to user-defined chat template by @wwl2755 in #15408
[Misc] improve example script output by @reidliu41 in #15528
Separate base model from TransformersModel by @hmellor in #15467
Apply torchfix by @cyyever in #15532
Improve validation of TP in Transformers backend by @hmellor in #15540
[Model] Add Reasoning Parser for Granite Models by @alex-jw-brooks in #14202
multi-node offline DP+EP example by @youkaichao in #15484
Fix weight loading for some models in Transformers backend by @hmellor in #15544
[Refactor] Remove passthrough backend when generate grammar by @aarnphm in #15317
[V1][Sampler] Faster top-k only implementation by @njhill in #15478
Support SHA256 as hash function in prefix caching by @dr75 in #15297
Applying some fixes for K8s agents in CI by @Alexei-V-Ivanov-AMD in #15493
[V1] TPU - Revert to exponential padding by default by @alexm-redhat in #15565
[V1] TPU CI - Fix test_compilation.py by @alexm-redhat in #15570
Use Cache Hinting for fused_moe kernel by @wrmedford in #15511
[TPU] support disabling xla compilation cache by @yaochengji in #15567
Support FIPS enabled machines with MD5 hashing by @MattTheCuber in #15299
[Kernel] CUTLASS grouped gemm fp8 MoE kernel by @ElizaWszola in #13972
Add automatic tpu label to mergify.yml by @mgoin in #15560
add platform check back by @Chenyaaang in #15578
[misc] LoRA: Remove unused long context test data by @varun-sundar-rabindranath in #15558
[Doc] Update V1 user guide for fp8 kv cache support by @wayzeng in #15585
[moe][quant] add weight name case for offset by @MengqingCao in #15515
[V1] Refactor num_computed_tokens logic by @comaniac in #15307
Allow torchao quantization in SiglipMLP by @jerryzh168 in #15575
[ROCm] Env variable to trigger custom PA by @gshtras in #15557
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS by @yaochengji in #15583
[Misc] Restrict ray version dependency and update PP feature warning in V1 by @ruisearch42 in #15556
[TPU] Avoid Triton Import by @robertgshaw2-redhat in #15589
[Misc] Consolidate LRUCache implementations by @Avabowler in #15481
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM by @robertgshaw2-redhat in #15587
[Misc] Clean up scatter_patch_features by @DarkLight1337 in #15559
[Misc] Use model_redirect to redirect the model name to a local folder. by @noooop in #14116
Fix incorrect filenames in vllm_compile_cache.py by @zou3519 in #15494
[Doc] update --system for transformers installation in docker doc by @reidliu41 in #15616
[Model] MiniCPM-V/O supports V1 by @DarkLight1337 in #15487
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 by @h-sugi in #15211
[Doc] Link to onboarding tasks by @DarkLight1337 in #15629
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs by @DarkLight1337 in #15620
[Feature] Add middleware to log API Server responses by @terrytangyuan in #15593
[Misc] Avoid direct access of global mm_registry in compute_encoder_budget by @DarkLight1337 in #15621
[Doc] Use absolute placement for Ask AI button by @hmellor in #15628
[Bugfix][TPU][V1] Fix recompilation by @NickLucche in #15553
Correct PowerPC to modern IBM Power by @clnperez in #15635
[CI] Update rules for applying tpu label. by @russellb in #15634
[V1] AsyncLLM data parallel by @njhill in #13923
[TPU] Lazy Import by @robertgshaw2-redhat in #15656
[Quantization][V1] BitsAndBytes support V1 by @jeejeelee in #15611
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. by @kebe7jun in #14948
[Doc] Fix dead links in Job Board by @wwl2755 in #15637
[CI][TPU] Temporarily Disable Quant Test on TPU by @robertgshaw2-redhat in #15649
Revert "Use Cache Hinting for fused_moe kernel (#15511)" by @wrmedford in #15645
[Misc]add coding benchmark for speculative decoding by @CXIAAAAA in #15303
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 by @gshtras in #14578
Refactor error handling for multiple exceptions in preprocessing by @JasonZhu1313 in #15650
[Bugfix] Fix mm_hashes forgetting to be passed by @DarkLight1337 in #15668
[V1] Remove legacy input registry by @DarkLight1337 in #15673
[TPU][CI] Fix TPUModelRunner Test by @robertgshaw2-redhat in #15667
[Refactor][Frontend] Keep all logic about reasoning into one class by @gaocegege in #14428
[CPU][CI] Improve CPU Dockerfile by @bigPYJ1151 in #15690
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' by @jeejeelee in #15674
[Misc] Fix test_sleep to use query parameters by @lizzzcai in #14373
[Bugfix][Frontend] Eliminate regex based check in reasoning full generator by @gaocegege in #14821
[Frontend] update priority for --api-key and VLLM_API_KEY by @reidliu41 in #15588
[Docs] Add "Generation quality changed" section to troubleshooting by @hmellor in #15701
[Model] Adding torch compile annotations to chatglm by @jeejeelee in #15624
[Bugfix][v1] xgrammar structured output supports Enum. by @chaunceyjiang in #15594
[Bugfix] embed_is_patch for Idefics3 by @DarkLight1337 in #15696
[V1] Support disable_any_whtespace for guidance backend by @russellb in #15584
[doc] add missing imports by @reidliu41 in #15699
[Bugfix] Fix regex compile display format by @kebe7jun in #15368
Fix cpu offload testing for gptq/awq/ct by @mgoin in #15648
[Minor] Remove TGI launching script by @WoosukKwon in #15646
[Misc] Remove unused utils and clean up imports by @DarkLight1337 in #15708
[Misc] Remove stale func in KVTransferConfig by @ShangmingCai in #14746
[TPU] [Perf] Improve Memory Usage Estimation by @robertgshaw2-redhat in #15671
[Bugfix] [torch.compile] Add Dynamo metrics context during compilation by @ProExpertProg in #15639
[V1] TPU - Fix the chunked prompt bug by @alexm-redhat in #15713
[Misc] cli auto show default value by @reidliu41 in #15582
implement prometheus fast-api-instrumentor for http service metrics by @daniel-salib in #15657
[Docs][V1] Optimize diagrams in prefix caching design by @simpx in #15716
[ROCm][AMD][Build] Update AMD supported arch list by @gshtras in #15632
[Model] Support Skywork-R1V by @pengyuange in #15397
[Docs] Document v0 engine support in reasoning outputs by @gaocegege in #15739
[Misc][V1] Misc code streamlining by @njhill in #15723
[Bugfix] LoRA V1: add and fix entrypoints tests by @varun-sundar-rabindranath in #15715
[CI] Speed up V1 structured output tests by @russellb in #15718
Use numba 0.61 for python 3.10+ to support numpy>=2 by @cyyever in #15692
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server by @jinzhen-lin in #15700
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K by @NickLucche in #15714
[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 by @yarongmu-google in #15659
[doc] update doc by @reidliu41 in #15740
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel by @tjtanaa in #15433
[V1] [Feature] Collective RPC by @wwl2755 in #15444
[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore by @ShangmingCai in #12957
[V1] Support interleaved modality items by @ywang96 in #15605
[V1][Minor] Simplify rejection sampler's parse_output by @WoosukKwon in #15741
[Bugfix] Fix Mllama interleaved images input support by @Isotr0py in #15564
[CI] xgrammar structured output supports Enum. by @chaunceyjiang in #15757
[Bugfix] Fix Mistral guided generation using xgrammar by @juliendenize in #15704
[doc] update conda to usage link in installation by @reidliu41 in #15761
fix test_phi3v by @pansicheng in #15321
[V1] Override mm_counts for dummy data creation by @DarkLight1337 in #15703
fix: lint fix a ruff checkout syntax error by @yihong0618 in #15767
[Bugfix] Added embed_is_patch mask for fuyu model by @kylehh in #15731
fix: Comments to English for better dev experience by @yihong0618 in #15768
[V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request by @WoosukKwon in #15778
[Misc] update the comments by @lcy4869 in #15780
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup by @JenZhao in #15748
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm by @charlifu in #15050
Recommend developing with Python 3.12 in developer guide by @hmellor in #15811
fix: better install requirement for install in setup.py by @yihong0618 in #15796
[V1] Fully Transparent Implementation of CPU Offloading by @youkaichao in #15354
[Model] Update support for NemotronNAS models by @Naveassaf in #15008
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats by @alex-jw-brooks in #15813
[Bugfix] Fix missing return value in load_weights method of adapters.py by @noc-turne in #15542
Upgrade transformers to v4.50.3 by @hmellor in #13905
[Bugfix] Check dimensions of multimodal embeddings in V1 by @DarkLight1337 in #15816
[V1][Spec Decode] Remove deprecated spec decode config params by @ShangmingCai in #15466
fix: change GB to GiB in logging close #14979 by @yihong0618 in #15807
[V1] TPU CI - Add basic perf regression test by @alexm-redhat in #15414
Fix Transformers backend compatibility check by @hmellor in #15290
[V1][Core] Remove unused speculative config from scheduler by @markmc in #15818
Move dockerfiles into their own directory by @hmellor in #14549
[Distributed] Add custom allreduce support for ROCM by @ilmarkov in #14125
Rename fallback model and refactor supported models section by @hmellor in #15829
[Frontend] Add Phi-4-mini function calling support by @kinfey in #14886
[Bugfix][Model] fix mllama multi-image by @yma11 in #14883
[Bugfix] Fix extra comma by @haochengxia in #15851
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding by @alexwl in #15824
[V1] TPU - Fix fused MOE by @alexm-redhat in #15834
[sleep mode] clear pytorch cache after sleep by @lionelvillard in #15248
[ROCm] Use device name in the warning by @gshtras in #15838
[V1] Implement sliding window attention in kv_cache_manager by @heheda12345 in #14097
fix: can not use uv run collect_env close #13888 by @yihong0618 in #15792
[Feature] specify model in config.yaml by @wayzeng in #15798
[Misc] Enable V1 LoRA by default by @varun-sundar-rabindranath in #15320
[Misc] Fix speculative config repr string by @ShangmingCai in #15860
[Docs] Fix small error in link text by @hmellor in #15868
[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser by @Isotr0py in #15828
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE by @ruisearch42 in #15831
setup correct nvcc version with CUDA_HOME by @chenyang78 in #15725
[Model] Support Mistral3 in the HF Transformers format by @mgoin in #15505
[Misc] remove unused script by @reidliu41 in #15746
Remove format.sh as it's been unsupported >70 days by @hmellor in #15884
[New Model]: jinaai/jina-reranker-v2-base-multilingual by @noooop in #15876
[Doc] Quark quantization documentation by @cha557 in #15861
Reinstate format.sh and make pre-commit installation simpler by @hmellor in #15890
[Misc] Allow using OpenCV as video IO fallback by @Isotr0py in #15055
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork by @gshtras in #15820
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. by @bnellnm in #13932
[CI/Build] Clean up LoRA tests by @jeejeelee in #15867
[Model] Aya Vision by @JenZhao in #15441
[Model] Add module name prefixes to gemma3 by @cloud11665 in #15889
[CI] Disable flaky structure decoding test temporarily. by @ywang96 in #15892
[V1][Metrics] Initial speculative decoding metrics by @markmc in #15151
[V1][Spec Decode] Implement Eagle Proposer [1/N] by @WoosukKwon in #15729
[Docs] update usage stats language by @simon-mo in #15898
[BugFix] make sure socket close by @yihong0618 in #15875
[Model][MiniMaxText01] Support MiniMaxText01 model inference by @ZZBoom in #13454
[Docs] Add Ollama meetup slides by @simon-mo in #15905
[Docs] Add Intel as Sponsor by @simon-mo in #15913
Fix input triton kernel for eagle by @ekagra-ranjan in #15909
[V1] Fix: make sure k_index is int64 for apply_top_k_only by @b8zhong in #15907
[Bugfix] Fix imports for MoE on CPU by @gau-nernst in #15841
[V1][Minor] Enhance SpecDecoding Metrics Log in V1 by @WoosukKwon in #15902
[Doc] Update rocm.inc.md by @chun37 in #15917
[V1][Bugfix] Fix typo in MoE TPU checking by @ywang96 in #15927
[Benchmark]Fix error message by @Potabk in #15866
[Misc] Replace print with logger by @chaunceyjiang in #15923
[CI/Build] Further clean up LoRA tests by @jeejeelee in #15920
[Bugfix] Fix cache block size calculation for CPU MLA by @gau-nernst in #15848
[Build/CI] Update lm-eval to 0.4.8 by @cthi in #15912
[Kernel] Add more dtype support for GGUF dequantization by @LukasBluebaum in #15879
[core] Add tags parameter to wake_up() by @erictang000 in #15500
[V1] Fix json_object support with xgrammar by @russellb in #15488
Add minimum version for huggingface_hub to enable Xet downloads by @hmellor in #15873
[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key by @b8zhong in #15926
[CI] Remove duplicate entrypoints-test by @yankay in #15940
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. by @chaunceyjiang in #15938
[Metrics] Hide deprecated metrics by @markmc in #15458
[Frontend] Implement Tool Calling with tool_choice='required' by @meffmadd in #13483
[CPU][Bugfix] Using custom allreduce for CPU backend by @bigPYJ1151 in #15934
[Model] use AutoWeightsLoader in model load_weights by @lengrongfu in #15770
[Misc] V1 LoRA support CPU offload by @jeejeelee in #15843
Restricted cmake to be less than version 4 as 4.x breaks the build of… by @npanpaliya in #15859
[misc] instruct pytorch to use nvml-based cuda check by @youkaichao in #15951
[V1] Support Mistral3 in V1 by @mgoin in #15950
Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] by @hmellor in #15969
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). by @hyeygit in #15736
[TPU] optimize the all-reduce performance by @yaochengji in #15903
[V1][TPU] Do not compile sampling more than needed by @NickLucche in #15883
[ROCM][KERNEL] Paged attention for V1 by @maleksan85 in #15720
fix: better error message for get_config close #13889 by @yihong0618 in #15943
[bugfix] add seed in torchrun_example.py by @youkaichao in #15980
[ROCM][V0] PA kennel selection when no sliding window provided by @maleksan85 in #15982
[Benchmark] Add AIMO Dataset to Benchmark by @StevenShi-23 in #15955
[misc] improve error message for "Failed to infer device type" by @youkaichao in #15994
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process by @wwl2755 in #15367
[doc] update contribution link by @reidliu41 in #15922
fix: tiny fix make format.sh excutable by @yihong0618 in #16015
[SupportsQuant] Bert, Blip, Blip2, Bloom by @kylesayrs in #15573
[SupportsQuant] Chameleon, Chatglm, Commandr by @kylesayrs in #15952
[Neuron][kernel] Fuse kv cache into a single tensor by @liangfu in #15911
[Minor] Fused experts refactor by @bnellnm in #15914
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … by @yarongmu-google in #16024
Re-enable the AMD Testing for the passing tests. by @Alexei-V-Ivanov-AMD in #15586
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. by @vanbasten23 in #15732
[TPU] Switch Test to Non-Sliding Window by @robertgshaw2-redhat in #15981
[Bugfix] Fix function names in test_block_fp8.py by @bnellnm in #16033
[ROCm] Tweak the benchmark script to run on ROCm by @huydhn in #14252
[Misc] improve gguf check by @reidliu41 in #15974
[TPU][V1] Remove ragged attention kernel parameter hard coding by @yaochengji in #16041
doc: add info for macos clang errors by @yihong0618 in #16049
[V1][Spec Decode] Avoid logging useless nan metrics by @markmc in #16023
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt by @jonghyunchoe in #15939
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe by @zhenwei-intel in #15945
[Bugfix][kernels] Fix half2float conversion in gguf kernels by @Isotr0py in #15995
[Benchmark][Doc] Update throughput benchmark and README by @StevenShi-23 in #15998
[CPU] Change default block_size for CPU backend by @bigPYJ1151 in #16002
[Distributed] [ROCM] Fix custom allreduce enable checks by @ilmarkov in #16010
[ROCm][Bugfix] Use platform specific FP8 dtype by @gshtras in #15717
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917, but for ROCm only by @gshtras in #15413
[Bugfix] Fix default behavior/fallback for pp in v1 by @mgoin in #16057
[CI] Reorganize .buildkite directory by @khluu in #16001
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue by @njhill in #15906
[V1] Scatter and gather placeholders in the model runner by @DarkLight1337 in #15712
Revert "[V1] Scatter and gather placeholders in the model runner" by @ywang96 in #16075
[Kernel][Bugfix] Re-fuse triton moe weight application by @bnellnm in #16071
[Bugfix][TPU] Fix V1 TPU worker for sliding window by @mgoin in #16059
[V1][Spec Decode] Update N-gram Proposer Interface by @WoosukKwon in #15750
[Model] Support Llama4 in vLLM by @houseroad in #16104

New Contributors

@Shafi-Hussain made their first contribution in #15402
@oteroantoniogom made their first contribution in #15471
@cyyever made their first contribution in #15532
@dr75 made their first contribution in #15297
@wrmedford made their first contribution in #15511
@MattTheCuber made their first contribution in #15299
@jerryzh168 made their first contribution in #15575
@Avabowler made their first contribution in #15481
@zou3519 made their first contribution in #15494
@h-sugi made their first contribution in #15211
@clnperez made their first contribution in #15635
@kebe7jun made their first contribution in #14948
@CXIAAAAA made their first contribution in #15303
@lizzzcai made their first contribution in #14373
@simpx made their first contribution in #15716
@pengyuange made their first contribution in #15397
@pansicheng made their first contribution in #15321
@lcy4869 made their first contribution in #15780
@Naveassaf made their first contribution in #15008
@noc-turne made their first contribution in #15542
@ilmarkov made their first contribution in #14125
@kinfey made their first contribution in #14886
@haochengxia made their first contribution in #15851
@alexwl made their first contribution in #15824
@lionelvillard made their first contribution in #15248
@cha557 made their first contribution in #15861
@cloud11665 made their first contribution in #15889
@ZZBoom made their first contribution in #13454
@ekagra-ranjan made their first contribution in #15909
@chun37 made their first contribution in #15917
@cthi made their first contribution in #15912
@LukasBluebaum made their first contribution in #15879
@erictang000 made their first contribution in #15500
@yankay made their first contribution in #15940
@meffmadd made their first contribution in #13483
@lengrongfu made their first contribution in #15770
@StevenShi-23 made their first contribution in #15955

Full Changelog: v0.8.2...v0.8.3

vllm 0.8.3 v0.8.3 on Python PyPI

Highlights

Cluster Scale Serving

Model Supports

V1 Engine

Features

API

Performance

Hardwares

Doc, Build, Ecosystem

What's Changed

New Contributors

vllm 0.8.3
v0.8.3

on Python PyPI