verl-project/verl v0.7.1 on GitHub

Highlight

Model Engine

Megatron

Support R3 router replay with vllm and sglang #4840 #4986 #5185
Support MTP training in SFT/RL #4981 #4936
LoRA training enhancement with megatron-bridge: actor/ref share, LoRA adapter only refit, etc #4673 #4632
Support Qwen3.5 series training with mbridge #5381

VeOmni

New veomni training backend with FSDP+SP+EP #4882

torchtitan

New torchtitan training backend with FSDP+TP+PP+CP+EP, roadmap: #5306

Rollout Engine

vLLM

Separate model runner from training process and refit weights by cuda ipc #4280
FP8 rollout enhancement
Upgrade to vllm==0.17.0

SGLang

Support router replay
FP8 rollout enhancement
Upgrade to sglang==0.5.9

TensorRT-LLM

New tensorrt-llm rollout backend, roadmap: #5042

Checkpoint Engine

Add checkpoint engine manager #5031
Add hccl, kimi checkpoint engine backend #4885 #4954

Trainer

one-step-off/fully async trainer refactor with verl-core
- Unify checkpoint engine #5029
- Unify partial rollout agent loop with auto resume #5487
- Ascend NPU support for one-step-off/fully async

What's Changed

[ci] feat: add npu unit test by @yyyy2000 in #4626
[recipe] fix: workaround for making the one-step off-policy recipe compatible with IPv6 environments on Ascend NPU by @ji-huazhong in #4782
[fsdp] feat: integrate PrefixGrouper for GRPO training acceleration by @kevssim in #4368
[rollout] fix: use model_dump() for proper Pydantic serialization in token2text by @yurekami in #4706
[doc] chore: Change the name of npu unit test workflow by @yyyy2000 in #4800
[model] feat: support per sample temperature in trainer by @vermouth1992 in #4787
[tool] fix: add tools in single_turn_agent by @Junxiao-Zhao in #4798
[recipe] feat: migrate recipe to the dedicated repo verl-recipe as a submodule by @tongyx361 in #4795
[model] fix: fix temp dtype by @vermouth1992 in #4813
[vllm, sglang, rollout] fix: Fix a mistake when running run_qwen3_vl-30b-megatron.sh with latest verl and vllm0.12 by @cboss6 in #4810
[ckpt] feat: add checkpoint-engine abstraction by @wuxibin89 in #4775
[doc, ci] fix: Update Ascend doc and fix e2e_ascend CI by @FightingZhen in #4816
[trainer] feat: VeOmniEngine supports qwen3_vl ulysses by @A1waysBeenHere in #4806
[doc] chore: fix checkpoint engine image link by @wuxibin89 in #4821
[hardware] fix: automatically set device for SFT case by @A1waysBeenHere in #4828
[data] feat: TransferQueue - Update TransferQueue version and docs by @0oshowero0 in #4829
[doc] Update docs about fully_async_policy by @jsfanfanfan in #4826
[ckpt] fix: FSDP save ckpt after validation by @wdl339 in #4799
[perf] feat: Add MFU for Qwen3-VL dense by @zhihaofang1017 in #4753
[tool] fix: avoid nested ToolResponse in SandboxFusionTool by @Winston-Yuan in #4833
[vllm] fix: fix error in vllm patch for diff vllm version and add ci for moe with fp8 rollout by @Agoniii in #4824
[algo] feat: add optimal token baseline and variance proxy by @jiawei415 in #4678
[megatron] fix: Fix error in megatron workers by @zhihaofang1017 in #4832
[misc] feat: delete unnecessary base class in agent loop worker and vLLMHttpServer by @PeterSH6 in #4838
[misc] feat: consolidate tensordict before dispatch by @vermouth1992 in #4830
[training_utils] fix: json encode error in filelogger by @zhuangqh in #4811
[ckpt] chore: skip saving hf_checkpoint during megatron+lora training & add a separate lora merge script by @Junxiao-Zhao in #4839
[rollout, vllm] fix: accuracy issue in verl serve mode + vllm-ascend + dp + ep + tp scenarios by @leo-pony in #4783
[fsdp] feat: add validate process on trainer node when use_trainer_do_validate=True by @chenjiaoAngel in #4683
[misc] fix: recipe submodule accidentally been removed by @wuxibin89 in #4843
[worker, training_utils] fix: Engine Metric Aggregation by @JacobHelwig in #4778
[rollout] fix: configurable agent loop + multimodal data for fully-async by @XChen-Zero in #4842
[ci] test: switch the vlm rl test case in the npu environment to use the model engine by @ji-huazhong in #4844
[ckpt] fix: Megatron save ckpt after validation by @wdl339 in #4841
[megatron] feat: Share actor and ref in LoRA by @HollowMan6 in #4673
[fsdp, megatron] fix: Engine Rollout Worker LoRA Parameter Update by @JacobHelwig in #4836
[algo, rollout, sglang] feat: Support router replay with sglang by @moehanabi in #4840
[perf] feat: Add MFU for Qwen3-VL MoE by @zhihaofang1017 in #4859
[misc] fix: fix 3d position_ids for train_mini_batch by @wdl339 in #4860
fix(sft_trainer): Fix global_tokens and total_tokens metrics always showing 0.0 by @khazic in #4854
[rollout,vllm] feat: support vllm scheduling policy config and generate setting priority by @RobotGF in #4874
[ckpt] fix: prevent data loss when max_ckpt_to_keep=1 by @jreiml in #4873
[worker] feat: New engine share actor and ref for LoRA by @HollowMan6 in #4867
[worker] fix: new engine saves megatron LoRA adapters checkpoints by @HollowMan6 in #4866
[ckpt] fix: properly handle optimizer offloading for HybridDeviceOptimizer by @jreiml in #4870
[doc] chore: update verl meetup by @wuxibin89 in #4884
[vllm] Fix CLI argument serialization for list types by @jreiml in #4869
[data] fix: build_messages for multi-modal data by @ccilery in #4864
[rollout] fix: wrong display about Prometheus when using SGLang. by @jsfanfanfan in #4858
[vllm] fix: pad data_hp to be multiples of block_size by @Agoniii in #4835
[ci] feat: add ci to automatically submit PR request if precommit fails by @vermouth1992 in #4878
[doc] chore: async README backticks by @JacobHelwig in #4898
[doc] fix: correct typo in script comment by @Prozac614 in #4900
[veomni] refactor: minor refactoring to ensure veomni engine compatibility with forward_only mode by @ji-huazhong in #4889
[sglang] fix: sglang TP+DP support / port bug by @hustmf in #4715
Either remove + prefix: 'actor_rollout_ref.model.enable_activation_of… by @Tomsawyerhu in #4910
[vllm] fix: vllm_config arg gets removed in newer WorkerWrapperBase by @HollowMan6 in #4915
Correct Attention FLOPS estimation in flops_counter.py by @HaochenYuan in #4929
[algo, doc] feat: trust region sequence masking - (1) k3 KL avg and (2) veto for max criterion by @szrlee in #4544
[rollout] feat: use rollout and validate parallel process by @chenjiaoAngel in #4863
[model] feat: Add qwen3_vl_moe in VL_TYPE2INDEX for image_mask and vedio_mask computation by @A1waysBeenHere in #4923
[data] feat: TransferQueue - fix rm_score error of TransferQueue by @baymax591 in #4928
[vllm] fix: Update get_encoding import for vllm versions 0.13.0 and above by @xhx1022 in #4934
[ci] fix: Add hydra-core to pre-commit installation by @vermouth1992 in #4892
[data] feat: TransferQueue - Unify the return of reward by @walterchenchn in #4902
Revert "Correct Attention FLOPS estimation in flops_counter.py" by @vermouth1992 in #4937
[misc] fix: Correct Docstring arg in main() (PPO trainer) by @rfy48 in #4943
[misc] fix: resolve pre-commit hook execution errors by @ji-huazhong in #4941
[model] fix: qwen3-vl-30b npu_patch fix by @bjf-frz in #4888
[veomni] feat: support offloading/loading the veomni model/optimizer by @ji-huazhong in #4916
[rollout,mbridge] feat: add metrics for rollout num preempted and fix mbrideg freeze moe by @RobotGF in #4956
[single_controller] fix: pass max_colocate_count and detached params when merging RayResourcePool by @wdl339 in #4949
[single_controller] feat: Support dispatch/collect nested tensors with 3 or more dimensions by @JacobHelwig in #4940
[env] fix: upgrade torch, cudnn and deps versions in vllm image to fix performance issue by @Begunner in #4960
[training_utils] fix: correctly _resolve_device when not specified by @HollowMan6 in #4961
[trainer] fix: pass scores device type to group_mean_std call by @HollowMan6 in #4962
[training_utils] fix: Correct Attention TFLOPS estimation & fix CI by @HaochenYuan in #4959
[training_utils] A bug that caused device selection in group statistics to fail has been covered by tests. by @JohnConnor123 in #4967
[doc, data] fix: resolve broken documentation hyperlinks by @aphrodite1028 in #4970
[sglang, rollout] feat: support sglang as rollout engine in fully async policy by @AniZpZ in #4191
[megatron] feat: Using MTP in RL Training and Inference by @ArronHZG in #4936
[megatron] fix: fix megatron sync_weights oom on user_trainer_do_validate mode by @chenjiaoAngel in #4944
[rollout,vllm] fix: num_preempted metrics fix and typo correction in vllm async server by @RobotGF in #4976
[ray,rollout,trtllm] feat: Adding tensorrt_llm as new rollout engine by @joyang-nv in #4665
[data] fix: use lazy import for qwen_vl_utils in vision_utils.py by @Wheeeeeeeeels in #4991
[recipe,tool] feat: make GSM8K multiturn tool quickstart actually work by @letsgetai in #4998
[perf] feat: verl profiler system support Agent Loop scenario and integrate torch.profiler by @mengchengTang in #4320
[vllm] fix: vllm TP+DP suuport bug by @ccilery in #4969
[fsdp] fix: use module instead of function for fully_shard_module by @moaead in #5002
[misc] fix: update version in the main branch by @yyDing1 in #5006
[ckpt] feat: add Hccl ckpt engine backend by @hanhan-networking in #4885
[reward] fix: conditionally include reward_extra_keys in meta_info based on rm_scores presence by @none0663 in #5005
[rollout, vllm, sglang] fix: set default max_model_len by @ji-huazhong in #5018
[ci] fix: fix ci by @vermouth1992 in #5022
[ckpt] fix: npu load checkpoint by @Li-Yongwen in #4938
[megatron] fix: patch mcore for MLA support with flash_attn by @HollowMan6 in #4931
[BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation by @jianjunzhong in #4280
[megatron] feat: LoRA adapter only refit (TensorLoRARequest) by @HollowMan6 in #4632
[veomni] refactor: no long check the attn_implementation/moe_implementation in VeOmniEngineConfig by @A1waysBeenHere in #5019
[veomni] feat: support model resharding between veomni and rollout engine by @ji-huazhong in #5033
[trtllm] fix: Fixes for TRTLLM rollout by @hchings in #5032
[ci] fix: docker transformers==4.57.6 by @yyyy2000 in #5053
[rollout] feat: set max_model_len by max_model_len or use max_position_embedding by @RobotGF in #5052
[ckpt] feat: add CheckpointEngineManager by @wuxibin89 in #5031
[doc] feat: add npu gspo practice by @wucong25 in #4988
[ci] chore: move to verl-project by @wuxibin89 in #5059
[vllm, sglang] feat: opt for FP8 rollout memory by @Agoniii in #4997
[model] feat: add API to support automatically support engine backend by @vermouth1992 in #5050
[megatron, training_utils] fix: Patch MoEAlltoAllTokenDispatcher.preprocess for router replay by @HollowMan6 in #4986
[rollout, perf, cfg] fix: Add global step info and support more profile control params for rollout profiling (sglang backend) by @bithighrr in #5025
[fsdp, megatron] feat: Support fully-async training on Ascend NPU by @acat-rw in #5043
[doc, trainer] fix: shoudn't use rollout routing replay data for R2 by @HollowMan6 in #4973
[doc] feat: add dapo multi model optimization practice by @ChibiQuest in #5044
[ci] chore: fix ci failure by @wuxibin89 in #5068
[ci] chore: fix npu ci failure by @wucong25 in #5064
[sglang,ci,doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version for VeRL + Sglang by @xiazhahe in #5065
[megatron, training_utils] fix: router replay R3 align router replay data with global layer indices by @HollowMan6 in #5037
[trainer] fix: resolve dataset config in agent loop by @yyDing1 in #5034
[ckpt,rollout] fix: sleep_replicas before save_ckpt to avoid OOM by @RobotGF in #5079
[reward, ci] fix: colocate reward model ci break by @yyDing1 in #5084
[reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True by @none0663 in #5054
[rollout] fix: fix cpu allocation error in tensorrt_llm rollout manager by @SchumiDing in #5085
Revert "[reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True" by @wuxibin89 in #5091
[trtllm] fix: minor fixes to trtllm rollout by @hchings in #5095
[sglang] feat: add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends) by @xiazhahe in #5062
[doc] chore: update arch image by @wuxibin89 in #5106
[rollout] feat: automatically resume generation on abort by @wuxibin89 in #5071
[sglang, doc] feat: add NPU GRPO training scripts for Qwen3-30B (Megaton/SGLang backends) and update doc by @hustmf in #5060
[megatron] fix: megatron async save ckpt fix by @Leem-Li in #5016
[model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) by @psyloy in #4984
[data] feat: Add support for Llama3.2-11-b-vision by @SchumiDing in #5112
[vllm] feat: revert to the default behavior of cudagraph_mode by @vermouth1992 in #5109
[fsdp] fix: Handle different transformers versions for Vision2Seq models in FSDP model merger by @liangxuZhang in #5108
[megatron] feat: Support MTP training in SFT by @arvyanh in #4981
[sglang] fix: update wiki to support speculative decode rollout by @ArronHZG in #5116
[training_utils] fix: add upcasting for seq_len_effective to avoid potential overflow in calculate_workload by @albertcity in #5110
[ci] feat: add npu workflow，e2e_sft_llm&model&reward_model_vllm by @yyyy2000 in #5039
[doc] chore: update readme for SPEAR algorithm by @yuleiqin in #5124
[vllm] feat: add shared memory support for weight transfer and IPC support checks by @jianjunzhong in #5089
Revert "[rollout] feat: automatically resume generation on abort" by @PeterSH6 in #5127
[sglang] fix: skip MoE router layers for FP8 quantization by @eternally-z in #5122
[vllm] feat: get gpt-oss encoding on demand by @vermouth1992 in #5131
[rollout,vllm] feat: revert default value of max_num_seqs by @RobotGF in #5139
[misc] feat: Modify transformers dependency version in requirements by @vermouth1992 in #5141
[ci] fix: fix ci by downgrade transformers to <5 by @vermouth1992 in #5143
[misc] chore: rename huggingface-cli to hf to favor transformers v5 by @vermouth1992 in #5145
[worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode by @RobotGF in #5144
[megatron] fix: patched_routing accepts arbitrary args by @HollowMan6 in #5155
Revert "[worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode" by @vermouth1992 in #5156
[model, algo] feat: implement SAC algorithm and support Pi0.5 model by @Miical in #5118
[training_utils] fix: Resolved bugs and conflicts in the fully async caused by multiple PRs by @ZLiao097 in #5100
[reward] feat: split reward loop manager and agent loop manager by @yyDing1 in #5134
[Doc] feat: update README to add new awesome project RuleReasoner by @jacklanda in #5157
[trainer] feat: move save_ckpt before update_weights and validate by @RobotGF in #5137
[ci,doc] feat: Add ascend_ci_guide by @yyyy2000 in #5163
[rollout] fix: remove dtype cast by @vermouth1992 in #5117
[hardware] fix: update architecture check and CANN toolkit path retrieval in device.py by @jianjunzhong in #5142
[vllm] fix: build_app() missing 1 required positional argument: 'supported_tasks' by @HollowMan6 in #5093
[perf] feat: clear megatron global buffer memory by @wuxibin89 in #5173
[vllm, rollout] fix: Use different seeds for vllm by @victordion in #5179
[ci] fix: fully async ci break by @yyDing1 in #5166
[vllm] feat: make seed configurable and different among replicas by @vermouth1992 in #5181
[data] fix: keyword video_metadata by @sophiayyya in #5177
[megatron] fix: checkpoints uses fully_reshardable by default when supported by @HollowMan6 in #5154
[trtllm, rollout] test: add unittest by @hchings in #5102
[reward] refactor: migrate all reward managers to the new asynchronous reward manager by @yyDing1 in #5189
[vllm] fix: handle multimodal inputs correctly in full async mode by @Silas-11 in #5160
[megatron] feat: fused kernel suppport for new model engine by @HollowMan6 in #5191
[fsdp] feat: Merge lora in fsdp training to speed up rollout by @amzfang in #5115
[megatron] Add Megatron-Bridge support in fully async policy by @eternally-z in #5196
[perf] fix: infer server profiler args fix by @mengchengTang in #5121
[doc, perf] feat: add perf_tuning_on_ascend by @tardis-key in #5104
[ci] feat: add three npu workflow yml test by @daikang6 in #4978
[vllm] fix: ignore MoE router layers for FP8 quantization by @zpqiu in #5107
[worker, training_utils] fix: Metric Aggregation Across DP Ranks by @JacobHelwig in #5203
[megatron] fix: add protections for logits_processor_args.pop("loss_mask"), which may cause the forward_fn of value net collapse by @albertcity in #5204
[trtllm] fix: reduce peak mem usage during update_weight() by @hchings in #5212
[algo] feat: support rollout router replay in MegatronEngine by @xhx1022 in #5185
[trtllm] fix: add synchronization before resume kv_cache to prevent oom in non-leader ranks by @shuyixiong in #5208
[perf] feat: add images_seqlens on mfu calculation for engine_worker by @alwaysyiyu in #5207
[reward] fix: Add assert to prevent reward NaN caused by overlong_cfg.len=0 by @ZLiao097 in #5216
[recipe] refactor: refactor ray trainer for separate recipe use. (fully async / one step off) by @ArronHZG in #5184
[BREAKING][reward] refactor: remove reward model worker code and invocation by @yyDing1 in #5194
[fsdp] fix: Support trust_remote_code during FSDP HugginFace checkpoint save by @thvasilo in #5200
[worker] feat: Avoid redundant base weight sync when engine doesn't sleep by @JohnConnor123 in #5147
[ci] chore: fix npu ci by @wucong25 in #5218
[vllm] fix: apply moe weight loader patch for standard wight loading by @zjchenn in #5234
[ci] chore: fix npu ci setuptools by @yyyy2000 in #5238
[ci] chore: fix npu ci setuptools, keep update pip and packaging by @yyyy2000 in #5239
[reward] fix: preserve input non_tensor_batch in AgentLoopManager when reward_loop_worker_handles is None by @none0663 in #5195
[perf] fix: fix npu profiling scripts by @tongtong0613 in #5226
[megatron] feat: use yaml to manage mbridge args by @Kite0011 in #4584
[algo] feat: reduce routed expert padding via NestedTensor and uint8 dtype by @xhx1022 in #5240
[ray,trainer] feat: add master port range configuration for port range by @RobotGF in #5201
[BREAKING][reward] refactor: deprecate batch reward manager by @yyDing1 in #5237
[fsdp] feat: add script for qwen3next training on npu platform by @zjchenn in #5236
[doc] fix: Update ascend_sglang_best_practices.rst by @hustmf in #5261
[vllm, rollout] feat: update abort function with vllm internal pause_generation api by @PeterSH6 in #5253
[veomni, trainer] feat: add rl support for veomni backend by @ji-huazhong in #4882
[vllm] fix: run post-load weight processing once after async IPC sync by @zjchenn in #5235
[doc] chore: version of dapo_multi_model_optimization_practice by @ChibiQuest in #5263
[rollout] feat: make more rollout flags configurable to trtllm backend by @Superjomn in #5258
[doc] refactor: update reward documents by @yyDing1 in #5272
[doc] chore: Ascend retool practice doc by @LeoYao123 in #5266
[vllm, rollout] fix: auto-downgrade cudagraph_mode to PIECEWISE when DCP is enabled by @Siritao in #5262
[fsdp, veomni, trainer] fix: restrict npu-patch scope to avoid veomni backend interference by @ji-huazhong in #5268
[BREAKING][reward] refactor: the full reward configuration by @yyDing1 in #5255
[ci] chore: delete redundant npu ci by @yyyy2000 in #5259
[fsdp, megatron] refactor: Refactor Fully Async Implementation via Engine Workers by @ZLiao097 in #5269
[megatron, model] chore: add example of nemotron nano v3 by @ISEEKYAN in #5284
[misc] chore: fix veomni_trainer.yaml by @wuxibin89 in #5285
[megatron] fix: fallback to moe_router_padding_for_fp8 in router replay patch by @xhx1022 in #5283
[reward] fix: backward compatibility with old reward config by @yyDing1 in #5287
[reward] fix: reward model args and reward_kwargs bug by @yyDing1 in #5289
[doc] chore: gspo update config and add version with npu by @chengminhua in #5279
[fsdp,veomni] fix: remove FSDPUlyssesShardingManager to make eval_mode/train_mode reentrant by @wuxibin89 in #5305
[veomni] refactor: Modify dp related parameters to align with FSDP backend and remove temporarily unsupported TP/PP/CP parameters by @ChengQianqian in #5303
[trtllm] feat: use max utilization scheduler by default by @tongyuantongyu in #5302
[worker, tool] fix: stabilize agent loop extra fields schema by @denismegerle in #5301
[algo] feat: add NPU SAPO training script for Qwen3-8B (FSDP/vLLM backends) by @Vvictorrrr in #5257
[fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-8B (FSDP/VLLM backends) by @zhihaofang1017 in #5250
[fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends) by @alwaysyiyu in #5260
[model,cfg] fix: type annotation for Lora target_modules by @thvasilo in #5223
[megatron] feat: Support LoRA training with FP16 using Megatron-Bridge. by @xichengpro in #4648
[ci] fix: main pre-commit by @pengwu22 in #5318
[misc] refactor: delete remaining batch-mode code in single controller by @ji-huazhong in #5319
[rollout] fix: make skip rollout compatible with async mode by @ChengQianqian in #5320
[veomni, trainer] fix: padding pixel value with padding_scale for vl model by @A1waysBeenHere in #5322
[fsdp,algo] feat: add NVFP4 QAT (Quantization-Aware Training) support by @zhangyimi in #5190
[docs] Add new awesome work using Verl by @MING-ZCH in #5328
[vllm] feat: remove workers from vLLMHttpServer by @tongyx361 in #5330
Revert "[vllm] feat: remove workers from vLLMHttpServer" by @PeterSH6 in #5333
[misc] refactor: remove deprecated codes by @ji-huazhong in #5336
[misc] fix: include config files for experimental entrypoints in package data by @guillemgt in #5343
[ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile by @ji-huazhong in #5345
Revert "[ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile" by @ji-huazhong in #5353
[reward] fix: empty class_dict for standalone reward model resource pool by @yyDing1 in #5348
[trainer] feat: Add Torchtitan as alternative training engine by @acisseJZhong in #5051
[training_utils] fix: mask out-of-bounds vocab entries fused kernel LCE logsumexp by @EricMarcus-ai in #5349
[rollout] fix: Include routed_experts in ToolAgentLoop return value to support R3 router replay by @mirrorboat in #5368
[misc] fix: pass torch dtype when init random model by @HollowMan6 in #5370
[ci] chore: pin version cupy-cuda12x==13.6.0 by @wuxibin89 in #5377
[doc] chore: ascend add performance analysis guide and update some version info by @chengminhua in #5324
[trainer] feat: Support RL trainer with TorchtitanEngine by @acisseJZhong in #5356
[algo] feat: Exception for agg_loss when dp_size > 1 but global information is absent & fix: correct & consistent loss aggregation for "seq-mean-token-sum-norm" by @tongyx361 in #5366
[rollout] fix: make run_uvicorn behavior more reliable by @tongyuantongyu in #5383
[doc] feat: update documentation for The Optimal Token Baseline and Rollout Correction by @jiawei415 in #5380
[trainer] refactor: remove fsdp_sft_trainer.py by @wuxibin89 in #5382
[ci] fix: occasional CI failures caused by sglang server port conflicts by @pengwu22 in #5310
[fsdp] fix: add aggressive_empty_cache at end of init_model to prevent vLLM OOM by @EricMarcus-ai in #5384
[doc, worker] feat: Enable Megatron-Bridge for MTP by @HollowMan6 in #5323
[ckpt] feat: add kimi ckpt engine backend by @kip-cxj in #4954
[misc] feat: ignore pyrightconfig.json to allow users to customize pyrightconfig to fix breaks by @tongyx361 in #5385
[ci] chore: update triton-ascend and fix npu ut by @yyyy2000 in #5396
[fsdp, megatron] feat: refactor fully-async and one-step-off training to support multiple checkpoint engine backends by @Shangwei-Li in #5029
[doc] feat: add fully async and one step off to PR Checklist by @ArronHZG in #5404
[doc] chore: ascend update gspo optimization practice document by @chengminhua in #5408
[algo] feat: add DPPO with binary TV or binary KL implementation by @QPHutu in #5397
[doc] chore: npu best practice doc by @hustmf in #5415
[algo] fix: seq mean and default scale factor loss_mask.shape[-1] as in seq-mean-token-sum-norm by @tongyx361 in #5417
[megatron] fix: missing model offload to CPU for forward_only mode by @xhx1022 in #5406
[megatron] feat: enhance model offloading and loading for frozen parameters by @RobotGF in #5412
[perf] fix: the overwritten of Torch_profile with multi steps. by @Rhetee in #5395
[trainer] feat: add padding for tensor alignment in preprocess_thd_no_padding function by @RobotGF in #5410
[tool] fix: handle empty image inputs in ToolAgentLoop by @denismegerle in #5420
[rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter by @denismegerle in #5359
[tool] refactor: remove tool schema plumbing from SingleTurnAgentLoop by @denismegerle in #5425
[misc] feat: Add code for data grouping in no-padding scenario by @Kite0011 in #5424
[doc] add Dr. MAS to awesome work by @langfengQ in #5427
[BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout by @wuxibin89 in #5418
[ci] chore: bump the version of vllm-ascend to v0.11.0 in the ascend dockerfile by @ji-huazhong in #5431
[doc] chore: fix npu docs by @wucong25 in #5428
[doc] fix: fix npu retool docs by @LeoYao123 in #5449
[data] refactor: TransferQueue - retire legacy integration codes by @0oshowero0 in #5454
[ci] fix: failed trtllm_unit_tests with attribute error by @HollowMan6 in #5446
[megatron] fix: pass dp_group to rearrange_micro_batches to fix DeepEP timeout by @xhx1022 in #5451
[rollout] fix: remove unexpected concurrency bound at 1000 by @tongyuantongyu in #5402
[data] fix: accept jsonl dataset files by @zqzten in #5456
[single_controller] refactor: use BatchData to simplify concat and chunk in single_controller by @zw0610 in #5450
[megatron] feat: Support DSA indexer LoRA mappings by @HollowMan6 in #5462
[doc] fix: fix typo in agentic rl doc by @KevinZeng08 in #5461
[misc] chore: support transformers 5 by @HollowMan6 in #5445
[doc] fix: fix dapo multi model practice by @ChibiQuest in #5453
[trainer] feat: Update trainer API for TorchtitanEngine by @acisseJZhong in #5457
[rollout] refactor: bucketed transfer utils by @pengwu22 in #5309
[rollout] feat: update trtllm docker by @Superjomn in #5386
[doc] fix: fix npu retool doc by @LeoYao123 in #5467
[ckpt] feat: add mooncake backend by @x1314aq in #5176
[doc] chore: add ascend backend feature by @wucong25 in #5466
[megatron] fix: support hybrid dense/MoE models in router replay with PP/VPP by @xhx1022 in #5452
[megatron] fix: patch support newer mcore version by @HollowMan6 in #5372
[ci] fix: sanity issue related to Last updated string by @HollowMan6 in #5477
[rollout] feat: support auto resume on abort in FullyAsyncLLMServerManager by @wuxibin89 in #5430
[trainer] feat: Support EP with TorchtitanEngine by @acisseJZhong in #5469
docs: fix typo in kl_penalty docstring by @ZHAOoops in #5481
[megatron] fix: add FP8 block quantization padding for EngineWorker by @zpqiu in #5440
[ckpt, model] fix: preserve lora_alpha in model_merger via training meta by @Yatogaii in #5326
[fsdp,algo] feat: Support QAT (NVFP4) in FSDPEngine for the unified engine_workers architecture by @zhangyimi in #5411
[doc] feat: add mtp spec log by @ArronHZG in #5491
[reward] feat: add example scripts for reward model usage by @yyDing1 in #5486
[BREAKING][trtllm] feat: Add FP8 refit support for trtllm rollout by @shuyixiong in #5374
[veomni,ci] fix: Modify default setting in veomni test scripts to prevent misunderstanding by @0oshowero0 in #5484
[ckpt] fix: test issues of kimi and mooncake backend by @x1314aq in #5500
[doc] chore: update FP8 guide with E2E training section and reorganization by @zpqiu in #5502
[model,doc] feat: add qwen3 32B megatron 1k to 256k by @ChibiQuest in #5497
[doc] chore: npu docker support vllm013 by @yyyy2000 in #5471
[doc] fix: update recipe link to fix 404 not found by @tardis-key in #5286
[ci] feat: add npu nightly ci by @daikang6 in #5225
[data] fix: use %-style format placeholders in logger.warning() by @cavities12 in #5512
[rollout] feat: global request-level load balancer single source routing by @aoshen524 in #5399
[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in #5149
Revert "[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout" by @wuxibin89 in #5525
[ckpt] fix: Fix checkpoint engine backend unset error by @ZLiao097 in #5473
[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in #5528
[Megatron] feat: Support routing replay on NPU with performance and compatibility enhancements by @755651978 in #5298
[rollout] fix: update checkpoint_engine bucket size parameter for Ascend compatibility by @nuerxiati in #5539
[misc] feat: support dynamic bsz using group size by @Kite0011 in #5438
[fully_async, one_step_off] feat: support auto resume on abort when using fully_async by @ArronHZG in #5487
[doc] chore: add note for kimi ckpt engine by @kip-cxj in #5546
[perf, trainer, training_utils] fix: Try to montior with mlflow up to 3 times, and avoid duplicate key processing in each step. by @sheilaliuxl in #5548
[trainer] fix: support nsys when using sft_trainer_ray.py by @arvyanh in #5489
[rollout] fix: reintroduce NCCL_CUMEM_ENABLE for weight synchronization in async rollout environments by @RobotGF in #5522
[ci] feat: npu nightly ci log is redirected to the specified directory by @daikang6 in #5557
[ci] fix: sft_trainer_ray ci break by @wuxibin89 in #5562
[fsdp] fix: wrap embed_tokens/lm_head by name for peft models by @cavities12 in #5516
[ci] chore: update npu ci to vllm013 by @yyyy2000 in #5523
[algo] feat: support router replay in MegatronEngine by @xhx1022 in #5219
[docker] feat: update stable image to vllm==0.17.0, sglang==0.5.9 by @Begunner in #5542
[megatron, model] feat: qwen3.5 example by @ISEEKYAN in #5381
[algo] feat: add GDPO (Group reward-Decoupled Normalization Policy Optimization) algorithm by @Rhetee in #5503
[megatron] feat: model engine support mtp by @ArronHZG in #5561
[doc] fix: fix te pip install instructions by @TKONIY in #5501
[rollout] fix: agent loop copy read-only routed_experts before torch conversion by @HollowMan6 in #5519
[ci] chore: change machine for npu ci by @yyyy2000 in #5578
[megatron] fix: apply override_transformer_config inside mindspeed engine to avoid confict with other training engine by @ChengQianqian in #5589
[rollout] fix: fix some compatibility issue with qwen vl seris support of trtllm rollout by @SchumiDing in #5583
[misc] chore: bump version to 0.7.1 by @wuxibin89 in #5602

New Contributors

@yyyy2000 made their first contribution in #4626
@Junxiao-Zhao made their first contribution in #4798
@cboss6 made their first contribution in #4810
@wdl339 made their first contribution in #4799
@zhihaofang1017 made their first contribution in #4753
@Winston-Yuan made their first contribution in #4833
@jiawei415 made their first contribution in #4678
@XChen-Zero made their first contribution in #4842
@khazic made their first contribution in #4854
@jreiml made their first contribution in #4873
@Prozac614 made their first contribution in #4900
@hustmf made their first contribution in #4715
@Tomsawyerhu made their first contribution in #4910
@xhx1022 made their first contribution in #4934
@walterchenchn made their first contribution in #4902
@rfy48 made their first contribution in #4943
@bjf-frz made their first contribution in #4888
@JohnConnor123 made their first contribution in #4967
@aphrodite1028 made their first contribution in #4970
@AniZpZ made their first contribution in #4191
@joyang-nv made their first contribution in #4665
@Wheeeeeeeeels made their first contribution in #4991
@letsgetai made their first contribution in #4998
@moaead made their first contribution in #5002
@hanhan-networking made their first contribution in #4885
@Li-Yongwen made their first contribution in #4938
@jianjunzhong made their first contribution in #4280
@hchings made their first contribution in #5032
@bithighrr made their first contribution in #5025
@ChibiQuest made their first contribution in #5044
@xiazhahe made their first contribution in #5065
@SchumiDing made their first contribution in #5085
@psyloy made their first contribution in #4984
@liangxuZhang made their first contribution in #5108
@arvyanh made their first contribution in #4981
@albertcity made their first contribution in #5110
@yuleiqin made their first contribution in #5124
@eternally-z made their first contribution in #5122
@Miical made their first contribution in #5118
@jacklanda made their first contribution in #5157
@victordion made their first contribution in #5179
@sophiayyya made their first contribution in #5177
@Silas-11 made their first contribution in #5160
@amzfang made their first contribution in #5115
@daikang6 made their first contribution in #4978
@shuyixiong made their first contribution in #5208
@alwaysyiyu made their first contribution in #5207
@thvasilo made their first contribution in #5200
@Superjomn made their first contribution in #5258
@LeoYao123 made their first contribution in #5266
@Siritao made their first contribution in #5262
@ChengQianqian made their first contribution in #5303
@tongyuantongyu made their first contribution in #5302
@denismegerle made their first contribution in #5301
@Vvictorrrr made their first contribution in #5257
@zhangyimi made their first contribution in #5190
@MING-ZCH made their first contribution in #5328
@guillemgt made their first contribution in #5343
@acisseJZhong made their first contribution in #5051
@mirrorboat made their first contribution in #5368
@kip-cxj made their first contribution in #4954
@QPHutu made their first contribution in #5397
@Rhetee made their first contribution in #5395
@zqzten made their first contribution in #5456
@KevinZeng08 made their first contribution in #5461
@x1314aq made their first contribution in #5176
@ZHAOoops made their first contribution in #5481
@Yatogaii made their first contribution in #5326
@cavities12 made their first contribution in #5512
@755651978 made their first contribution in #5298
@sheilaliuxl made their first contribution in #5548
@TKONIY made their first contribution in #5501

Full Changelog: v0.7.0...v0.7.1