github verl-project/verl v0.7.1

5 hours ago

Highlight

Model Engine

Megatron

  • Support R3 router replay with vllm and sglang #4840 #4986 #5185
  • Support MTP training in SFT/RL #4981 #4936
  • LoRA training enhancement with megatron-bridge: actor/ref share, LoRA adapter only refit, etc #4673 #4632
  • Support Qwen3.5 series training with mbridge #5381

VeOmni

  • New veomni training backend with FSDP+SP+EP #4882

torchtitan

  • New torchtitan training backend with FSDP+TP+PP+CP+EP, roadmap: #5306

Rollout Engine

vLLM

  • Separate model runner from training process and refit weights by cuda ipc #4280
  • FP8 rollout enhancement
  • Upgrade to vllm==0.17.0

SGLang

  • Support router replay
  • FP8 rollout enhancement
  • Upgrade to sglang==0.5.9

TensorRT-LLM

  • New tensorrt-llm rollout backend, roadmap: #5042

Checkpoint Engine

  • Add checkpoint engine manager #5031
  • Add hccl, kimi checkpoint engine backend #4885 #4954

Trainer

  • one-step-off/fully async trainer refactor with verl-core
    • Unify checkpoint engine #5029
    • Unify partial rollout agent loop with auto resume #5487
    • Ascend NPU support for one-step-off/fully async

What's Changed

  • [ci] feat: add npu unit test by @yyyy2000 in #4626
  • [recipe] fix: workaround for making the one-step off-policy recipe compatible with IPv6 environments on Ascend NPU by @ji-huazhong in #4782
  • [fsdp] feat: integrate PrefixGrouper for GRPO training acceleration by @kevssim in #4368
  • [rollout] fix: use model_dump() for proper Pydantic serialization in token2text by @yurekami in #4706
  • [doc] chore: Change the name of npu unit test workflow by @yyyy2000 in #4800
  • [model] feat: support per sample temperature in trainer by @vermouth1992 in #4787
  • [tool] fix: add tools in single_turn_agent by @Junxiao-Zhao in #4798
  • [recipe] feat: migrate recipe to the dedicated repo verl-recipe as a submodule by @tongyx361 in #4795
  • [model] fix: fix temp dtype by @vermouth1992 in #4813
  • [vllm, sglang, rollout] fix: Fix a mistake when running run_qwen3_vl-30b-megatron.sh with latest verl and vllm0.12 by @cboss6 in #4810
  • [ckpt] feat: add checkpoint-engine abstraction by @wuxibin89 in #4775
  • [doc, ci] fix: Update Ascend doc and fix e2e_ascend CI by @FightingZhen in #4816
  • [trainer] feat: VeOmniEngine supports qwen3_vl ulysses by @A1waysBeenHere in #4806
  • [doc] chore: fix checkpoint engine image link by @wuxibin89 in #4821
  • [hardware] fix: automatically set device for SFT case by @A1waysBeenHere in #4828
  • [data] feat: TransferQueue - Update TransferQueue version and docs by @0oshowero0 in #4829
  • [doc] Update docs about fully_async_policy by @jsfanfanfan in #4826
  • [ckpt] fix: FSDP save ckpt after validation by @wdl339 in #4799
  • [perf] feat: Add MFU for Qwen3-VL dense by @zhihaofang1017 in #4753
  • [tool] fix: avoid nested ToolResponse in SandboxFusionTool by @Winston-Yuan in #4833
  • [vllm] fix: fix error in vllm patch for diff vllm version and add ci for moe with fp8 rollout by @Agoniii in #4824
  • [algo] feat: add optimal token baseline and variance proxy by @jiawei415 in #4678
  • [megatron] fix: Fix error in megatron workers by @zhihaofang1017 in #4832
  • [misc] feat: delete unnecessary base class in agent loop worker and vLLMHttpServer by @PeterSH6 in #4838
  • [misc] feat: consolidate tensordict before dispatch by @vermouth1992 in #4830
  • [training_utils] fix: json encode error in filelogger by @zhuangqh in #4811
  • [ckpt] chore: skip saving hf_checkpoint during megatron+lora training & add a separate lora merge script by @Junxiao-Zhao in #4839
  • [rollout, vllm] fix: accuracy issue in verl serve mode + vllm-ascend + dp + ep + tp scenarios by @leo-pony in #4783
  • [fsdp] feat: add validate process on trainer node when use_trainer_do_validate=True by @chenjiaoAngel in #4683
  • [misc] fix: recipe submodule accidentally been removed by @wuxibin89 in #4843
  • [worker, training_utils] fix: Engine Metric Aggregation by @JacobHelwig in #4778
  • [rollout] fix: configurable agent loop + multimodal data for fully-async by @XChen-Zero in #4842
  • [ci] test: switch the vlm rl test case in the npu environment to use the model engine by @ji-huazhong in #4844
  • [ckpt] fix: Megatron save ckpt after validation by @wdl339 in #4841
  • [megatron] feat: Share actor and ref in LoRA by @HollowMan6 in #4673
  • [fsdp, megatron] fix: Engine Rollout Worker LoRA Parameter Update by @JacobHelwig in #4836
  • [algo, rollout, sglang] feat: Support router replay with sglang by @moehanabi in #4840
  • [perf] feat: Add MFU for Qwen3-VL MoE by @zhihaofang1017 in #4859
  • [misc] fix: fix 3d position_ids for train_mini_batch by @wdl339 in #4860
  • fix(sft_trainer): Fix global_tokens and total_tokens metrics always showing 0.0 by @khazic in #4854
  • [rollout,vllm] feat: support vllm scheduling policy config and generate setting priority by @RobotGF in #4874
  • [ckpt] fix: prevent data loss when max_ckpt_to_keep=1 by @jreiml in #4873
  • [worker] feat: New engine share actor and ref for LoRA by @HollowMan6 in #4867
  • [worker] fix: new engine saves megatron LoRA adapters checkpoints by @HollowMan6 in #4866
  • [ckpt] fix: properly handle optimizer offloading for HybridDeviceOptimizer by @jreiml in #4870
  • [doc] chore: update verl meetup by @wuxibin89 in #4884
  • [vllm] Fix CLI argument serialization for list types by @jreiml in #4869
  • [data] fix: build_messages for multi-modal data by @ccilery in #4864
  • [rollout] fix: wrong display about Prometheus when using SGLang. by @jsfanfanfan in #4858
  • [vllm] fix: pad data_hp to be multiples of block_size by @Agoniii in #4835
  • [ci] feat: add ci to automatically submit PR request if precommit fails by @vermouth1992 in #4878
  • [doc] chore: async README backticks by @JacobHelwig in #4898
  • [doc] fix: correct typo in script comment by @Prozac614 in #4900
  • [veomni] refactor: minor refactoring to ensure veomni engine compatibility with forward_only mode by @ji-huazhong in #4889
  • [sglang] fix: sglang TP+DP support / port bug by @hustmf in #4715
  • Either remove + prefix: 'actor_rollout_ref.model.enable_activation_of… by @Tomsawyerhu in #4910
  • [vllm] fix: vllm_config arg gets removed in newer WorkerWrapperBase by @HollowMan6 in #4915
  • Correct Attention FLOPS estimation in flops_counter.py by @HaochenYuan in #4929
  • [algo, doc] feat: trust region sequence masking - (1) k3 KL avg and (2) veto for max criterion by @szrlee in #4544
  • [rollout] feat: use rollout and validate parallel process by @chenjiaoAngel in #4863
  • [model] feat: Add qwen3_vl_moe in VL_TYPE2INDEX for image_mask and vedio_mask computation by @A1waysBeenHere in #4923
  • [data] feat: TransferQueue - fix rm_score error of TransferQueue by @baymax591 in #4928
  • [vllm] fix: Update get_encoding import for vllm versions 0.13.0 and above by @xhx1022 in #4934
  • [ci] fix: Add hydra-core to pre-commit installation by @vermouth1992 in #4892
  • [data] feat: TransferQueue - Unify the return of reward by @walterchenchn in #4902
  • Revert "Correct Attention FLOPS estimation in flops_counter.py" by @vermouth1992 in #4937
  • [misc] fix: Correct Docstring arg in main() (PPO trainer) by @rfy48 in #4943
  • [misc] fix: resolve pre-commit hook execution errors by @ji-huazhong in #4941
  • [model] fix: qwen3-vl-30b npu_patch fix by @bjf-frz in #4888
  • [veomni] feat: support offloading/loading the veomni model/optimizer by @ji-huazhong in #4916
  • [rollout,mbridge] feat: add metrics for rollout num preempted and fix mbrideg freeze moe by @RobotGF in #4956
  • [single_controller] fix: pass max_colocate_count and detached params when merging RayResourcePool by @wdl339 in #4949
  • [single_controller] feat: Support dispatch/collect nested tensors with 3 or more dimensions by @JacobHelwig in #4940
  • [env] fix: upgrade torch, cudnn and deps versions in vllm image to fix performance issue by @Begunner in #4960
  • [training_utils] fix: correctly _resolve_device when not specified by @HollowMan6 in #4961
  • [trainer] fix: pass scores device type to group_mean_std call by @HollowMan6 in #4962
  • [training_utils] fix: Correct Attention TFLOPS estimation & fix CI by @HaochenYuan in #4959
  • [training_utils] A bug that caused device selection in group statistics to fail has been covered by tests. by @JohnConnor123 in #4967
  • [doc, data] fix: resolve broken documentation hyperlinks by @aphrodite1028 in #4970
  • [sglang, rollout] feat: support sglang as rollout engine in fully async policy by @AniZpZ in #4191
  • [megatron] feat: Using MTP in RL Training and Inference by @ArronHZG in #4936
  • [megatron] fix: fix megatron sync_weights oom on user_trainer_do_validate mode by @chenjiaoAngel in #4944
  • [rollout,vllm] fix: num_preempted metrics fix and typo correction in vllm async server by @RobotGF in #4976
  • [ray,rollout,trtllm] feat: Adding tensorrt_llm as new rollout engine by @joyang-nv in #4665
  • [data] fix: use lazy import for qwen_vl_utils in vision_utils.py by @Wheeeeeeeeels in #4991
  • [recipe,tool] feat: make GSM8K multiturn tool quickstart actually work by @letsgetai in #4998
  • [perf] feat: verl profiler system support Agent Loop scenario and integrate torch.profiler by @mengchengTang in #4320
  • [vllm] fix: vllm TP+DP suuport bug by @ccilery in #4969
  • [fsdp] fix: use module instead of function for fully_shard_module by @moaead in #5002
  • [misc] fix: update version in the main branch by @yyDing1 in #5006
  • [ckpt] feat: add Hccl ckpt engine backend by @hanhan-networking in #4885
  • [reward] fix: conditionally include reward_extra_keys in meta_info based on rm_scores presence by @none0663 in #5005
  • [rollout, vllm, sglang] fix: set default max_model_len by @ji-huazhong in #5018
  • [ci] fix: fix ci by @vermouth1992 in #5022
  • [ckpt] fix: npu load checkpoint by @Li-Yongwen in #4938
  • [megatron] fix: patch mcore for MLA support with flash_attn by @HollowMan6 in #4931
  • [BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation by @jianjunzhong in #4280
  • [megatron] feat: LoRA adapter only refit (TensorLoRARequest) by @HollowMan6 in #4632
  • [veomni] refactor: no long check the attn_implementation/moe_implementation in VeOmniEngineConfig by @A1waysBeenHere in #5019
  • [veomni] feat: support model resharding between veomni and rollout engine by @ji-huazhong in #5033
  • [trtllm] fix: Fixes for TRTLLM rollout by @hchings in #5032
  • [ci] fix: docker transformers==4.57.6 by @yyyy2000 in #5053
  • [rollout] feat: set max_model_len by max_model_len or use max_position_embedding by @RobotGF in #5052
  • [ckpt] feat: add CheckpointEngineManager by @wuxibin89 in #5031
  • [doc] feat: add npu gspo practice by @wucong25 in #4988
  • [ci] chore: move to verl-project by @wuxibin89 in #5059
  • [vllm, sglang] feat: opt for FP8 rollout memory by @Agoniii in #4997
  • [model] feat: add API to support automatically support engine backend by @vermouth1992 in #5050
  • [megatron, training_utils] fix: Patch MoEAlltoAllTokenDispatcher.preprocess for router replay by @HollowMan6 in #4986
  • [rollout, perf, cfg] fix: Add global step info and support more profile control params for rollout profiling (sglang backend) by @bithighrr in #5025
  • [fsdp, megatron] feat: Support fully-async training on Ascend NPU by @acat-rw in #5043
  • [doc, trainer] fix: shoudn't use rollout routing replay data for R2 by @HollowMan6 in #4973
  • [doc] feat: add dapo multi model optimization practice by @ChibiQuest in #5044
  • [ci] chore: fix ci failure by @wuxibin89 in #5068
  • [ci] chore: fix npu ci failure by @wucong25 in #5064
  • [sglang,ci,doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version for VeRL + Sglang by @xiazhahe in #5065
  • [megatron, training_utils] fix: router replay R3 align router replay data with global layer indices by @HollowMan6 in #5037
  • [trainer] fix: resolve dataset config in agent loop by @yyDing1 in #5034
  • [ckpt,rollout] fix: sleep_replicas before save_ckpt to avoid OOM by @RobotGF in #5079
  • [reward, ci] fix: colocate reward model ci break by @yyDing1 in #5084
  • [reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True by @none0663 in #5054
  • [rollout] fix: fix cpu allocation error in tensorrt_llm rollout manager by @SchumiDing in #5085
  • Revert "[reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True" by @wuxibin89 in #5091
  • [trtllm] fix: minor fixes to trtllm rollout by @hchings in #5095
  • [sglang] feat: add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends) by @xiazhahe in #5062
  • [doc] chore: update arch image by @wuxibin89 in #5106
  • [rollout] feat: automatically resume generation on abort by @wuxibin89 in #5071
  • [sglang, doc] feat: add NPU GRPO training scripts for Qwen3-30B (Megaton/SGLang backends) and update doc by @hustmf in #5060
  • [megatron] fix: megatron async save ckpt fix by @Leem-Li in #5016
  • [model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) by @psyloy in #4984
  • [data] feat: Add support for Llama3.2-11-b-vision by @SchumiDing in #5112
  • [vllm] feat: revert to the default behavior of cudagraph_mode by @vermouth1992 in #5109
  • [fsdp] fix: Handle different transformers versions for Vision2Seq models in FSDP model merger by @liangxuZhang in #5108
  • [megatron] feat: Support MTP training in SFT by @arvyanh in #4981
  • [sglang] fix: update wiki to support speculative decode rollout by @ArronHZG in #5116
  • [training_utils] fix: add upcasting for seq_len_effective to avoid potential overflow in calculate_workload by @albertcity in #5110
  • [ci] feat: add npu workflow,e2e_sft_llm&model&reward_model_vllm by @yyyy2000 in #5039
  • [doc] chore: update readme for SPEAR algorithm by @yuleiqin in #5124
  • [vllm] feat: add shared memory support for weight transfer and IPC support checks by @jianjunzhong in #5089
  • Revert "[rollout] feat: automatically resume generation on abort" by @PeterSH6 in #5127
  • [sglang] fix: skip MoE router layers for FP8 quantization by @eternally-z in #5122
  • [vllm] feat: get gpt-oss encoding on demand by @vermouth1992 in #5131
  • [rollout,vllm] feat: revert default value of max_num_seqs by @RobotGF in #5139
  • [misc] feat: Modify transformers dependency version in requirements by @vermouth1992 in #5141
  • [ci] fix: fix ci by downgrade transformers to <5 by @vermouth1992 in #5143
  • [misc] chore: rename huggingface-cli to hf to favor transformers v5 by @vermouth1992 in #5145
  • [worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode by @RobotGF in #5144
  • [megatron] fix: patched_routing accepts arbitrary args by @HollowMan6 in #5155
  • Revert "[worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode" by @vermouth1992 in #5156
  • [model, algo] feat: implement SAC algorithm and support Pi0.5 model by @Miical in #5118
  • [training_utils] fix: Resolved bugs and conflicts in the fully async caused by multiple PRs by @ZLiao097 in #5100
  • [reward] feat: split reward loop manager and agent loop manager by @yyDing1 in #5134
  • [Doc] feat: update README to add new awesome project RuleReasoner by @jacklanda in #5157
  • [trainer] feat: move save_ckpt before update_weights and validate by @RobotGF in #5137
  • [ci,doc] feat: Add ascend_ci_guide by @yyyy2000 in #5163
  • [rollout] fix: remove dtype cast by @vermouth1992 in #5117
  • [hardware] fix: update architecture check and CANN toolkit path retrieval in device.py by @jianjunzhong in #5142
  • [vllm] fix: build_app() missing 1 required positional argument: 'supported_tasks' by @HollowMan6 in #5093
  • [perf] feat: clear megatron global buffer memory by @wuxibin89 in #5173
  • [vllm, rollout] fix: Use different seeds for vllm by @victordion in #5179
  • [ci] fix: fully async ci break by @yyDing1 in #5166
  • [vllm] feat: make seed configurable and different among replicas by @vermouth1992 in #5181
  • [data] fix: keyword video_metadata by @sophiayyya in #5177
  • [megatron] fix: checkpoints uses fully_reshardable by default when supported by @HollowMan6 in #5154
  • [trtllm, rollout] test: add unittest by @hchings in #5102
  • [reward] refactor: migrate all reward managers to the new asynchronous reward manager by @yyDing1 in #5189
  • [vllm] fix: handle multimodal inputs correctly in full async mode by @Silas-11 in #5160
  • [megatron] feat: fused kernel suppport for new model engine by @HollowMan6 in #5191
  • [fsdp] feat: Merge lora in fsdp training to speed up rollout by @amzfang in #5115
  • [megatron] Add Megatron-Bridge support in fully async policy by @eternally-z in #5196
  • [perf] fix: infer server profiler args fix by @mengchengTang in #5121
  • [doc, perf] feat: add perf_tuning_on_ascend by @tardis-key in #5104
  • [ci] feat: add three npu workflow yml test by @daikang6 in #4978
  • [vllm] fix: ignore MoE router layers for FP8 quantization by @zpqiu in #5107
  • [worker, training_utils] fix: Metric Aggregation Across DP Ranks by @JacobHelwig in #5203
  • [megatron] fix: add protections for logits_processor_args.pop("loss_mask"), which may cause the forward_fn of value net collapse by @albertcity in #5204
  • [trtllm] fix: reduce peak mem usage during update_weight() by @hchings in #5212
  • [algo] feat: support rollout router replay in MegatronEngine by @xhx1022 in #5185
  • [trtllm] fix: add synchronization before resume kv_cache to prevent oom in non-leader ranks by @shuyixiong in #5208
  • [perf] feat: add images_seqlens on mfu calculation for engine_worker by @alwaysyiyu in #5207
  • [reward] fix: Add assert to prevent reward NaN caused by overlong_cfg.len=0 by @ZLiao097 in #5216
  • [recipe] refactor: refactor ray trainer for separate recipe use. (fully async / one step off) by @ArronHZG in #5184
  • [BREAKING][reward] refactor: remove reward model worker code and invocation by @yyDing1 in #5194
  • [fsdp] fix: Support trust_remote_code during FSDP HugginFace checkpoint save by @thvasilo in #5200
  • [worker] feat: Avoid redundant base weight sync when engine doesn't sleep by @JohnConnor123 in #5147
  • [ci] chore: fix npu ci by @wucong25 in #5218
  • [vllm] fix: apply moe weight loader patch for standard wight loading by @zjchenn in #5234
  • [ci] chore: fix npu ci setuptools by @yyyy2000 in #5238
  • [ci] chore: fix npu ci setuptools, keep update pip and packaging by @yyyy2000 in #5239
  • [reward] fix: preserve input non_tensor_batch in AgentLoopManager when reward_loop_worker_handles is None by @none0663 in #5195
  • [perf] fix: fix npu profiling scripts by @tongtong0613 in #5226
  • [megatron] feat: use yaml to manage mbridge args by @Kite0011 in #4584
  • [algo] feat: reduce routed expert padding via NestedTensor and uint8 dtype by @xhx1022 in #5240
  • [ray,trainer] feat: add master port range configuration for port range by @RobotGF in #5201
  • [BREAKING][reward] refactor: deprecate batch reward manager by @yyDing1 in #5237
  • [fsdp] feat: add script for qwen3next training on npu platform by @zjchenn in #5236
  • [doc] fix: Update ascend_sglang_best_practices.rst by @hustmf in #5261
  • [vllm, rollout] feat: update abort function with vllm internal pause_generation api by @PeterSH6 in #5253
  • [veomni, trainer] feat: add rl support for veomni backend by @ji-huazhong in #4882
  • [vllm] fix: run post-load weight processing once after async IPC sync by @zjchenn in #5235
  • [doc] chore: version of dapo_multi_model_optimization_practice by @ChibiQuest in #5263
  • [rollout] feat: make more rollout flags configurable to trtllm backend by @Superjomn in #5258
  • [doc] refactor: update reward documents by @yyDing1 in #5272
  • [doc] chore: Ascend retool practice doc by @LeoYao123 in #5266
  • [vllm, rollout] fix: auto-downgrade cudagraph_mode to PIECEWISE when DCP is enabled by @Siritao in #5262
  • [fsdp, veomni, trainer] fix: restrict npu-patch scope to avoid veomni backend interference by @ji-huazhong in #5268
  • [BREAKING][reward] refactor: the full reward configuration by @yyDing1 in #5255
  • [ci] chore: delete redundant npu ci by @yyyy2000 in #5259
  • [fsdp, megatron] refactor: Refactor Fully Async Implementation via Engine Workers by @ZLiao097 in #5269
  • [megatron, model] chore: add example of nemotron nano v3 by @ISEEKYAN in #5284
  • [misc] chore: fix veomni_trainer.yaml by @wuxibin89 in #5285
  • [megatron] fix: fallback to moe_router_padding_for_fp8 in router replay patch by @xhx1022 in #5283
  • [reward] fix: backward compatibility with old reward config by @yyDing1 in #5287
  • [reward] fix: reward model args and reward_kwargs bug by @yyDing1 in #5289
  • [doc] chore: gspo update config and add version with npu by @chengminhua in #5279
  • [fsdp,veomni] fix: remove FSDPUlyssesShardingManager to make eval_mode/train_mode reentrant by @wuxibin89 in #5305
  • [veomni] refactor: Modify dp related parameters to align with FSDP backend and remove temporarily unsupported TP/PP/CP parameters by @ChengQianqian in #5303
  • [trtllm] feat: use max utilization scheduler by default by @tongyuantongyu in #5302
  • [worker, tool] fix: stabilize agent loop extra fields schema by @denismegerle in #5301
  • [algo] feat: add NPU SAPO training script for Qwen3-8B (FSDP/vLLM backends) by @Vvictorrrr in #5257
  • [fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-8B (FSDP/VLLM backends) by @zhihaofang1017 in #5250
  • [fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends) by @alwaysyiyu in #5260
  • [model,cfg] fix: type annotation for Lora target_modules by @thvasilo in #5223
  • [megatron] feat: Support LoRA training with FP16 using Megatron-Bridge. by @xichengpro in #4648
  • [ci] fix: main pre-commit by @pengwu22 in #5318
  • [misc] refactor: delete remaining batch-mode code in single controller by @ji-huazhong in #5319
  • [rollout] fix: make skip rollout compatible with async mode by @ChengQianqian in #5320
  • [veomni, trainer] fix: padding pixel value with padding_scale for vl model by @A1waysBeenHere in #5322
  • [fsdp,algo] feat: add NVFP4 QAT (Quantization-Aware Training) support by @zhangyimi in #5190
  • [docs] Add new awesome work using Verl by @MING-ZCH in #5328
  • [vllm] feat: remove workers from vLLMHttpServer by @tongyx361 in #5330
  • Revert "[vllm] feat: remove workers from vLLMHttpServer" by @PeterSH6 in #5333
  • [misc] refactor: remove deprecated codes by @ji-huazhong in #5336
  • [misc] fix: include config files for experimental entrypoints in package data by @guillemgt in #5343
  • [ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile by @ji-huazhong in #5345
  • Revert "[ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile" by @ji-huazhong in #5353
  • [reward] fix: empty class_dict for standalone reward model resource pool by @yyDing1 in #5348
  • [trainer] feat: Add Torchtitan as alternative training engine by @acisseJZhong in #5051
  • [training_utils] fix: mask out-of-bounds vocab entries fused kernel LCE logsumexp by @EricMarcus-ai in #5349
  • [rollout] fix: Include routed_experts in ToolAgentLoop return value to support R3 router replay by @mirrorboat in #5368
  • [misc] fix: pass torch dtype when init random model by @HollowMan6 in #5370
  • [ci] chore: pin version cupy-cuda12x==13.6.0 by @wuxibin89 in #5377
  • [doc] chore: ascend add performance analysis guide and update some version info by @chengminhua in #5324
  • [trainer] feat: Support RL trainer with TorchtitanEngine by @acisseJZhong in #5356
  • [algo] feat: Exception for agg_loss when dp_size > 1 but global information is absent & fix: correct & consistent loss aggregation for "seq-mean-token-sum-norm" by @tongyx361 in #5366
  • [rollout] fix: make run_uvicorn behavior more reliable by @tongyuantongyu in #5383
  • [doc] feat: update documentation for The Optimal Token Baseline and Rollout Correction by @jiawei415 in #5380
  • [trainer] refactor: remove fsdp_sft_trainer.py by @wuxibin89 in #5382
  • [ci] fix: occasional CI failures caused by sglang server port conflicts by @pengwu22 in #5310
  • [fsdp] fix: add aggressive_empty_cache at end of init_model to prevent vLLM OOM by @EricMarcus-ai in #5384
  • [doc, worker] feat: Enable Megatron-Bridge for MTP by @HollowMan6 in #5323
  • [ckpt] feat: add kimi ckpt engine backend by @kip-cxj in #4954
  • [misc] feat: ignore pyrightconfig.json to allow users to customize pyrightconfig to fix breaks by @tongyx361 in #5385
  • [ci] chore: update triton-ascend and fix npu ut by @yyyy2000 in #5396
  • [fsdp, megatron] feat: refactor fully-async and one-step-off training to support multiple checkpoint engine backends by @Shangwei-Li in #5029
  • [doc] feat: add fully async and one step off to PR Checklist by @ArronHZG in #5404
  • [doc] chore: ascend update gspo optimization practice document by @chengminhua in #5408
  • [algo] feat: add DPPO with binary TV or binary KL implementation by @QPHutu in #5397
  • [doc] chore: npu best practice doc by @hustmf in #5415
  • [algo] fix: seq mean and default scale factor loss_mask.shape[-1] as in seq-mean-token-sum-norm by @tongyx361 in #5417
  • [megatron] fix: missing model offload to CPU for forward_only mode by @xhx1022 in #5406
  • [megatron] feat: enhance model offloading and loading for frozen parameters by @RobotGF in #5412
  • [perf] fix: the overwritten of Torch_profile with multi steps. by @Rhetee in #5395
  • [trainer] feat: add padding for tensor alignment in preprocess_thd_no_padding function by @RobotGF in #5410
  • [tool] fix: handle empty image inputs in ToolAgentLoop by @denismegerle in #5420
  • [rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter by @denismegerle in #5359
  • [tool] refactor: remove tool schema plumbing from SingleTurnAgentLoop by @denismegerle in #5425
  • [misc] feat: Add code for data grouping in no-padding scenario by @Kite0011 in #5424
  • [doc] add Dr. MAS to awesome work by @langfengQ in #5427
  • [BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout by @wuxibin89 in #5418
  • [ci] chore: bump the version of vllm-ascend to v0.11.0 in the ascend dockerfile by @ji-huazhong in #5431
  • [doc] chore: fix npu docs by @wucong25 in #5428
  • [doc] fix: fix npu retool docs by @LeoYao123 in #5449
  • [data] refactor: TransferQueue - retire legacy integration codes by @0oshowero0 in #5454
  • [ci] fix: failed trtllm_unit_tests with attribute error by @HollowMan6 in #5446
  • [megatron] fix: pass dp_group to rearrange_micro_batches to fix DeepEP timeout by @xhx1022 in #5451
  • [rollout] fix: remove unexpected concurrency bound at 1000 by @tongyuantongyu in #5402
  • [data] fix: accept jsonl dataset files by @zqzten in #5456
  • [single_controller] refactor: use BatchData to simplify concat and chunk in single_controller by @zw0610 in #5450
  • [megatron] feat: Support DSA indexer LoRA mappings by @HollowMan6 in #5462
  • [doc] fix: fix typo in agentic rl doc by @KevinZeng08 in #5461
  • [misc] chore: support transformers 5 by @HollowMan6 in #5445
  • [doc] fix: fix dapo multi model practice by @ChibiQuest in #5453
  • [trainer] feat: Update trainer API for TorchtitanEngine by @acisseJZhong in #5457
  • [rollout] refactor: bucketed transfer utils by @pengwu22 in #5309
  • [rollout] feat: update trtllm docker by @Superjomn in #5386
  • [doc] fix: fix npu retool doc by @LeoYao123 in #5467
  • [ckpt] feat: add mooncake backend by @x1314aq in #5176
  • [doc] chore: add ascend backend feature by @wucong25 in #5466
  • [megatron] fix: support hybrid dense/MoE models in router replay with PP/VPP by @xhx1022 in #5452
  • [megatron] fix: patch support newer mcore version by @HollowMan6 in #5372
  • [ci] fix: sanity issue related to Last updated string by @HollowMan6 in #5477
  • [rollout] feat: support auto resume on abort in FullyAsyncLLMServerManager by @wuxibin89 in #5430
  • [trainer] feat: Support EP with TorchtitanEngine by @acisseJZhong in #5469
  • docs: fix typo in kl_penalty docstring by @ZHAOoops in #5481
  • [megatron] fix: add FP8 block quantization padding for EngineWorker by @zpqiu in #5440
  • [ckpt, model] fix: preserve lora_alpha in model_merger via training meta by @Yatogaii in #5326
  • [fsdp,algo] feat: Support QAT (NVFP4) in FSDPEngine for the unified engine_workers architecture by @zhangyimi in #5411
  • [doc] feat: add mtp spec log by @ArronHZG in #5491
  • [reward] feat: add example scripts for reward model usage by @yyDing1 in #5486
  • [BREAKING][trtllm] feat: Add FP8 refit support for trtllm rollout by @shuyixiong in #5374
  • [veomni,ci] fix: Modify default setting in veomni test scripts to prevent misunderstanding by @0oshowero0 in #5484
  • [ckpt] fix: test issues of kimi and mooncake backend by @x1314aq in #5500
  • [doc] chore: update FP8 guide with E2E training section and reorganization by @zpqiu in #5502
  • [model,doc] feat: add qwen3 32B megatron 1k to 256k by @ChibiQuest in #5497
  • [doc] chore: npu docker support vllm013 by @yyyy2000 in #5471
  • [doc] fix: update recipe link to fix 404 not found by @tardis-key in #5286
  • [ci] feat: add npu nightly ci by @daikang6 in #5225
  • [data] fix: use %-style format placeholders in logger.warning() by @cavities12 in #5512
  • [rollout] feat: global request-level load balancer single source routing by @aoshen524 in #5399
  • [rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in #5149
  • Revert "[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout" by @wuxibin89 in #5525
  • [ckpt] fix: Fix checkpoint engine backend unset error by @ZLiao097 in #5473
  • [rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in #5528
  • [Megatron] feat: Support routing replay on NPU with performance and compatibility enhancements by @755651978 in #5298
  • [rollout] fix: update checkpoint_engine bucket size parameter for Ascend compatibility by @nuerxiati in #5539
  • [misc] feat: support dynamic bsz using group size by @Kite0011 in #5438
  • [fully_async, one_step_off] feat: support auto resume on abort when using fully_async by @ArronHZG in #5487
  • [doc] chore: add note for kimi ckpt engine by @kip-cxj in #5546
  • [perf, trainer, training_utils] fix: Try to montior with mlflow up to 3 times, and avoid duplicate key processing in each step. by @sheilaliuxl in #5548
  • [trainer] fix: support nsys when using sft_trainer_ray.py by @arvyanh in #5489
  • [rollout] fix: reintroduce NCCL_CUMEM_ENABLE for weight synchronization in async rollout environments by @RobotGF in #5522
  • [ci] feat: npu nightly ci log is redirected to the specified directory by @daikang6 in #5557
  • [ci] fix: sft_trainer_ray ci break by @wuxibin89 in #5562
  • [fsdp] fix: wrap embed_tokens/lm_head by name for peft models by @cavities12 in #5516
  • [ci] chore: update npu ci to vllm013 by @yyyy2000 in #5523
  • [algo] feat: support router replay in MegatronEngine by @xhx1022 in #5219
  • [docker] feat: update stable image to vllm==0.17.0, sglang==0.5.9 by @Begunner in #5542
  • [megatron, model] feat: qwen3.5 example by @ISEEKYAN in #5381
  • [algo] feat: add GDPO (Group reward-Decoupled Normalization Policy Optimization) algorithm by @Rhetee in #5503
  • [megatron] feat: model engine support mtp by @ArronHZG in #5561
  • [doc] fix: fix te pip install instructions by @TKONIY in #5501
  • [rollout] fix: agent loop copy read-only routed_experts before torch conversion by @HollowMan6 in #5519
  • [ci] chore: change machine for npu ci by @yyyy2000 in #5578
  • [megatron] fix: apply override_transformer_config inside mindspeed engine to avoid confict with other training engine by @ChengQianqian in #5589
  • [rollout] fix: fix some compatibility issue with qwen vl seris support of trtllm rollout by @SchumiDing in #5583
  • [misc] chore: bump version to 0.7.1 by @wuxibin89 in #5602

New Contributors

  • @yyyy2000 made their first contribution in #4626
  • @Junxiao-Zhao made their first contribution in #4798
  • @cboss6 made their first contribution in #4810
  • @wdl339 made their first contribution in #4799
  • @zhihaofang1017 made their first contribution in #4753
  • @Winston-Yuan made their first contribution in #4833
  • @jiawei415 made their first contribution in #4678
  • @XChen-Zero made their first contribution in #4842
  • @khazic made their first contribution in #4854
  • @jreiml made their first contribution in #4873
  • @Prozac614 made their first contribution in #4900
  • @hustmf made their first contribution in #4715
  • @Tomsawyerhu made their first contribution in #4910
  • @xhx1022 made their first contribution in #4934
  • @walterchenchn made their first contribution in #4902
  • @rfy48 made their first contribution in #4943
  • @bjf-frz made their first contribution in #4888
  • @JohnConnor123 made their first contribution in #4967
  • @aphrodite1028 made their first contribution in #4970
  • @AniZpZ made their first contribution in #4191
  • @joyang-nv made their first contribution in #4665
  • @Wheeeeeeeeels made their first contribution in #4991
  • @letsgetai made their first contribution in #4998
  • @moaead made their first contribution in #5002
  • @hanhan-networking made their first contribution in #4885
  • @Li-Yongwen made their first contribution in #4938
  • @jianjunzhong made their first contribution in #4280
  • @hchings made their first contribution in #5032
  • @bithighrr made their first contribution in #5025
  • @ChibiQuest made their first contribution in #5044
  • @xiazhahe made their first contribution in #5065
  • @SchumiDing made their first contribution in #5085
  • @psyloy made their first contribution in #4984
  • @liangxuZhang made their first contribution in #5108
  • @arvyanh made their first contribution in #4981
  • @albertcity made their first contribution in #5110
  • @yuleiqin made their first contribution in #5124
  • @eternally-z made their first contribution in #5122
  • @Miical made their first contribution in #5118
  • @jacklanda made their first contribution in #5157
  • @victordion made their first contribution in #5179
  • @sophiayyya made their first contribution in #5177
  • @Silas-11 made their first contribution in #5160
  • @amzfang made their first contribution in #5115
  • @daikang6 made their first contribution in #4978
  • @shuyixiong made their first contribution in #5208
  • @alwaysyiyu made their first contribution in #5207
  • @thvasilo made their first contribution in #5200
  • @Superjomn made their first contribution in #5258
  • @LeoYao123 made their first contribution in #5266
  • @Siritao made their first contribution in #5262
  • @ChengQianqian made their first contribution in #5303
  • @tongyuantongyu made their first contribution in #5302
  • @denismegerle made their first contribution in #5301
  • @Vvictorrrr made their first contribution in #5257
  • @zhangyimi made their first contribution in #5190
  • @MING-ZCH made their first contribution in #5328
  • @guillemgt made their first contribution in #5343
  • @acisseJZhong made their first contribution in #5051
  • @mirrorboat made their first contribution in #5368
  • @kip-cxj made their first contribution in #4954
  • @QPHutu made their first contribution in #5397
  • @Rhetee made their first contribution in #5395
  • @zqzten made their first contribution in #5456
  • @KevinZeng08 made their first contribution in #5461
  • @x1314aq made their first contribution in #5176
  • @ZHAOoops made their first contribution in #5481
  • @Yatogaii made their first contribution in #5326
  • @cavities12 made their first contribution in #5512
  • @755651978 made their first contribution in #5298
  • @sheilaliuxl made their first contribution in #5548
  • @TKONIY made their first contribution in #5501

Full Changelog: v0.7.0...v0.7.1

Don't miss a new verl release

NewReleases is sending notifications on new releases.