Highlight
Model Engine
Megatron
- Support R3 router replay with vllm and sglang #4840 #4986 #5185
- Support MTP training in SFT/RL #4981 #4936
- LoRA training enhancement with megatron-bridge: actor/ref share, LoRA adapter only refit, etc #4673 #4632
- Support Qwen3.5 series training with mbridge #5381
VeOmni
- New veomni training backend with FSDP+SP+EP #4882
torchtitan
- New torchtitan training backend with FSDP+TP+PP+CP+EP, roadmap: #5306
Rollout Engine
vLLM
- Separate model runner from training process and refit weights by cuda ipc #4280
- FP8 rollout enhancement
- Upgrade to vllm==0.17.0
SGLang
- Support router replay
- FP8 rollout enhancement
- Upgrade to sglang==0.5.9
TensorRT-LLM
- New tensorrt-llm rollout backend, roadmap: #5042
Checkpoint Engine
Trainer
- one-step-off/fully async trainer refactor with verl-core
What's Changed
- [ci] feat: add npu unit test by @yyyy2000 in #4626
- [recipe] fix: workaround for making the one-step off-policy recipe compatible with IPv6 environments on Ascend NPU by @ji-huazhong in #4782
- [fsdp] feat: integrate PrefixGrouper for GRPO training acceleration by @kevssim in #4368
- [rollout] fix: use model_dump() for proper Pydantic serialization in token2text by @yurekami in #4706
- [doc] chore: Change the name of npu unit test workflow by @yyyy2000 in #4800
- [model] feat: support per sample temperature in trainer by @vermouth1992 in #4787
- [tool] fix: add tools in single_turn_agent by @Junxiao-Zhao in #4798
- [recipe] feat: migrate
recipeto the dedicated repoverl-recipeas a submodule by @tongyx361 in #4795 - [model] fix: fix temp dtype by @vermouth1992 in #4813
- [vllm, sglang, rollout] fix: Fix a mistake when running run_qwen3_vl-30b-megatron.sh with latest verl and vllm0.12 by @cboss6 in #4810
- [ckpt] feat: add checkpoint-engine abstraction by @wuxibin89 in #4775
- [doc, ci] fix: Update Ascend doc and fix e2e_ascend CI by @FightingZhen in #4816
- [trainer] feat: VeOmniEngine supports qwen3_vl ulysses by @A1waysBeenHere in #4806
- [doc] chore: fix checkpoint engine image link by @wuxibin89 in #4821
- [hardware] fix: automatically set device for SFT case by @A1waysBeenHere in #4828
- [data] feat: TransferQueue - Update TransferQueue version and docs by @0oshowero0 in #4829
- [doc] Update docs about fully_async_policy by @jsfanfanfan in #4826
- [ckpt] fix: FSDP save ckpt after validation by @wdl339 in #4799
- [perf] feat: Add MFU for Qwen3-VL dense by @zhihaofang1017 in #4753
- [tool] fix: avoid nested ToolResponse in SandboxFusionTool by @Winston-Yuan in #4833
- [vllm] fix: fix error in vllm patch for diff vllm version and add ci for moe with fp8 rollout by @Agoniii in #4824
- [algo] feat: add optimal token baseline and variance proxy by @jiawei415 in #4678
- [megatron] fix: Fix error in megatron workers by @zhihaofang1017 in #4832
- [misc] feat: delete unnecessary base class in agent loop worker and vLLMHttpServer by @PeterSH6 in #4838
- [misc] feat: consolidate tensordict before dispatch by @vermouth1992 in #4830
- [training_utils] fix: json encode error in filelogger by @zhuangqh in #4811
- [ckpt] chore: skip saving hf_checkpoint during megatron+lora training & add a separate lora merge script by @Junxiao-Zhao in #4839
- [rollout, vllm] fix: accuracy issue in verl serve mode + vllm-ascend + dp + ep + tp scenarios by @leo-pony in #4783
- [fsdp] feat: add validate process on trainer node when use_trainer_do_validate=True by @chenjiaoAngel in #4683
- [misc] fix: recipe submodule accidentally been removed by @wuxibin89 in #4843
- [worker, training_utils] fix: Engine Metric Aggregation by @JacobHelwig in #4778
- [rollout] fix: configurable agent loop + multimodal data for fully-async by @XChen-Zero in #4842
- [ci] test: switch the vlm rl test case in the npu environment to use the model engine by @ji-huazhong in #4844
- [ckpt] fix: Megatron save ckpt after validation by @wdl339 in #4841
- [megatron] feat: Share actor and ref in LoRA by @HollowMan6 in #4673
- [fsdp, megatron] fix: Engine Rollout Worker LoRA Parameter Update by @JacobHelwig in #4836
- [algo, rollout, sglang] feat: Support router replay with sglang by @moehanabi in #4840
- [perf] feat: Add MFU for Qwen3-VL MoE by @zhihaofang1017 in #4859
- [misc] fix: fix 3d position_ids for train_mini_batch by @wdl339 in #4860
- fix(sft_trainer): Fix global_tokens and total_tokens metrics always showing 0.0 by @khazic in #4854
- [rollout,vllm] feat: support vllm scheduling policy config and generate setting priority by @RobotGF in #4874
- [ckpt] fix: prevent data loss when max_ckpt_to_keep=1 by @jreiml in #4873
- [worker] feat: New engine share actor and ref for LoRA by @HollowMan6 in #4867
- [worker] fix: new engine saves megatron LoRA adapters checkpoints by @HollowMan6 in #4866
- [ckpt] fix: properly handle optimizer offloading for HybridDeviceOptimizer by @jreiml in #4870
- [doc] chore: update verl meetup by @wuxibin89 in #4884
- [vllm] Fix CLI argument serialization for list types by @jreiml in #4869
- [data] fix: build_messages for multi-modal data by @ccilery in #4864
- [rollout] fix: wrong display about Prometheus when using SGLang. by @jsfanfanfan in #4858
- [vllm] fix: pad data_hp to be multiples of block_size by @Agoniii in #4835
- [ci] feat: add ci to automatically submit PR request if precommit fails by @vermouth1992 in #4878
- [doc] chore: async README backticks by @JacobHelwig in #4898
- [doc] fix: correct typo in script comment by @Prozac614 in #4900
- [veomni] refactor: minor refactoring to ensure veomni engine compatibility with forward_only mode by @ji-huazhong in #4889
- [sglang] fix: sglang TP+DP support / port bug by @hustmf in #4715
- Either remove + prefix: 'actor_rollout_ref.model.enable_activation_of… by @Tomsawyerhu in #4910
- [vllm] fix:
vllm_configarg gets removed in newerWorkerWrapperBaseby @HollowMan6 in #4915 - Correct Attention FLOPS estimation in flops_counter.py by @HaochenYuan in #4929
- [algo, doc] feat: trust region sequence masking - (1) k3 KL avg and (2) veto for max criterion by @szrlee in #4544
- [rollout] feat: use rollout and validate parallel process by @chenjiaoAngel in #4863
- [model] feat: Add qwen3_vl_moe in VL_TYPE2INDEX for image_mask and vedio_mask computation by @A1waysBeenHere in #4923
- [data] feat: TransferQueue - fix rm_score error of TransferQueue by @baymax591 in #4928
- [vllm] fix: Update get_encoding import for vllm versions 0.13.0 and above by @xhx1022 in #4934
- [ci] fix: Add hydra-core to pre-commit installation by @vermouth1992 in #4892
- [data] feat: TransferQueue - Unify the return of reward by @walterchenchn in #4902
- Revert "Correct Attention FLOPS estimation in flops_counter.py" by @vermouth1992 in #4937
- [misc] fix: Correct Docstring arg in main() (PPO trainer) by @rfy48 in #4943
- [misc] fix: resolve pre-commit hook execution errors by @ji-huazhong in #4941
- [model] fix: qwen3-vl-30b npu_patch fix by @bjf-frz in #4888
- [veomni] feat: support offloading/loading the veomni model/optimizer by @ji-huazhong in #4916
- [rollout,mbridge] feat: add metrics for rollout num preempted and fix mbrideg freeze moe by @RobotGF in #4956
- [single_controller] fix: pass max_colocate_count and detached params when merging RayResourcePool by @wdl339 in #4949
- [single_controller] feat: Support dispatch/collect nested tensors with 3 or more dimensions by @JacobHelwig in #4940
- [env] fix: upgrade torch, cudnn and deps versions in vllm image to fix performance issue by @Begunner in #4960
- [training_utils] fix: correctly
_resolve_devicewhen not specified by @HollowMan6 in #4961 - [trainer] fix: pass scores device type to
group_mean_stdcall by @HollowMan6 in #4962 - [training_utils] fix: Correct Attention TFLOPS estimation & fix CI by @HaochenYuan in #4959
- [training_utils] A bug that caused device selection in group statistics to fail has been covered by tests. by @JohnConnor123 in #4967
- [doc, data] fix: resolve broken documentation hyperlinks by @aphrodite1028 in #4970
- [sglang, rollout] feat: support sglang as rollout engine in fully async policy by @AniZpZ in #4191
- [megatron] feat: Using MTP in RL Training and Inference by @ArronHZG in #4936
- [megatron] fix: fix megatron sync_weights oom on user_trainer_do_validate mode by @chenjiaoAngel in #4944
- [rollout,vllm] fix: num_preempted metrics fix and typo correction in vllm async server by @RobotGF in #4976
- [ray,rollout,trtllm] feat: Adding tensorrt_llm as new rollout engine by @joyang-nv in #4665
- [data] fix: use lazy import for qwen_vl_utils in vision_utils.py by @Wheeeeeeeeels in #4991
- [recipe,tool] feat: make GSM8K multiturn tool quickstart actually work by @letsgetai in #4998
- [perf] feat: verl profiler system support Agent Loop scenario and integrate torch.profiler by @mengchengTang in #4320
- [vllm] fix: vllm TP+DP suuport bug by @ccilery in #4969
- [fsdp] fix: use module instead of function for fully_shard_module by @moaead in #5002
- [misc] fix: update version in the main branch by @yyDing1 in #5006
- [ckpt] feat: add Hccl ckpt engine backend by @hanhan-networking in #4885
- [reward] fix: conditionally include reward_extra_keys in meta_info based on rm_scores presence by @none0663 in #5005
- [rollout, vllm, sglang] fix: set default max_model_len by @ji-huazhong in #5018
- [ci] fix: fix ci by @vermouth1992 in #5022
- [ckpt] fix: npu load checkpoint by @Li-Yongwen in #4938
- [megatron] fix: patch mcore for MLA support with flash_attn by @HollowMan6 in #4931
- [BREAKING][worker, rollout, vllm] feat: implement vLLM colocated training-inference rollout with process separation by @jianjunzhong in #4280
- [megatron] feat: LoRA adapter only refit (TensorLoRARequest) by @HollowMan6 in #4632
- [veomni] refactor: no long check the attn_implementation/moe_implementation in VeOmniEngineConfig by @A1waysBeenHere in #5019
- [veomni] feat: support model resharding between veomni and rollout engine by @ji-huazhong in #5033
- [trtllm] fix: Fixes for TRTLLM rollout by @hchings in #5032
- [ci] fix: docker transformers==4.57.6 by @yyyy2000 in #5053
- [rollout] feat: set max_model_len by max_model_len or use max_position_embedding by @RobotGF in #5052
- [ckpt] feat: add CheckpointEngineManager by @wuxibin89 in #5031
- [doc] feat: add npu gspo practice by @wucong25 in #4988
- [ci] chore: move to verl-project by @wuxibin89 in #5059
- [vllm, sglang] feat: opt for FP8 rollout memory by @Agoniii in #4997
- [model] feat: add API to support automatically support engine backend by @vermouth1992 in #5050
- [megatron, training_utils] fix: Patch MoEAlltoAllTokenDispatcher.preprocess for router replay by @HollowMan6 in #4986
- [rollout, perf, cfg] fix: Add global step info and support more profile control params for rollout profiling (sglang backend) by @bithighrr in #5025
- [fsdp, megatron] feat: Support fully-async training on Ascend NPU by @acat-rw in #5043
- [doc, trainer] fix: shoudn't use rollout routing replay data for R2 by @HollowMan6 in #4973
- [doc] feat: add dapo multi model optimization practice by @ChibiQuest in #5044
- [ci] chore: fix ci failure by @wuxibin89 in #5068
- [ci] chore: fix npu ci failure by @wucong25 in #5064
- [sglang,ci,doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version for VeRL + Sglang by @xiazhahe in #5065
- [megatron, training_utils] fix: router replay R3 align router replay data with global layer indices by @HollowMan6 in #5037
- [trainer] fix: resolve dataset config in agent loop by @yyDing1 in #5034
- [ckpt,rollout] fix: sleep_replicas before save_ckpt to avoid OOM by @RobotGF in #5079
- [reward, ci] fix: colocate reward model ci break by @yyDing1 in #5084
- [reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True by @none0663 in #5054
- [rollout] fix: fix cpu allocation error in tensorrt_llm rollout manager by @SchumiDing in #5085
- Revert "[reward] fix: fix reward computation in _validate when use_reward_loop=True and reward_model.enable=True" by @wuxibin89 in #5091
- [trtllm] fix: minor fixes to trtllm rollout by @hchings in #5095
- [sglang] feat: add NPU GRPO training scripts for Qwen2.5-32B (FSDP/SGLang backends) by @xiazhahe in #5062
- [doc] chore: update arch image by @wuxibin89 in #5106
- [rollout] feat: automatically resume generation on abort by @wuxibin89 in #5071
- [sglang, doc] feat: add NPU GRPO training scripts for Qwen3-30B (Megaton/SGLang backends) and update doc by @hustmf in #5060
- [megatron] fix: megatron async save ckpt fix by @Leem-Li in #5016
- [model] feat: add NPU GRPO training scripts for Qwen2.5-32B/Qwen3-30B (Megaton/vLLM backends) by @psyloy in #4984
- [data] feat: Add support for Llama3.2-11-b-vision by @SchumiDing in #5112
- [vllm] feat: revert to the default behavior of cudagraph_mode by @vermouth1992 in #5109
- [fsdp] fix: Handle different transformers versions for Vision2Seq models in FSDP model merger by @liangxuZhang in #5108
- [megatron] feat: Support MTP training in SFT by @arvyanh in #4981
- [sglang] fix: update wiki to support speculative decode rollout by @ArronHZG in #5116
- [training_utils] fix: add upcasting for
seq_len_effectiveto avoid potential overflow incalculate_workloadby @albertcity in #5110 - [ci] feat: add npu workflow,e2e_sft_llm&model&reward_model_vllm by @yyyy2000 in #5039
- [doc] chore: update readme for SPEAR algorithm by @yuleiqin in #5124
- [vllm] feat: add shared memory support for weight transfer and IPC support checks by @jianjunzhong in #5089
- Revert "[rollout] feat: automatically resume generation on abort" by @PeterSH6 in #5127
- [sglang] fix: skip MoE router layers for FP8 quantization by @eternally-z in #5122
- [vllm] feat: get gpt-oss encoding on demand by @vermouth1992 in #5131
- [rollout,vllm] feat: revert default value of max_num_seqs by @RobotGF in #5139
- [misc] feat: Modify transformers dependency version in requirements by @vermouth1992 in #5141
- [ci] fix: fix ci by downgrade transformers to <5 by @vermouth1992 in #5143
- [misc] chore: rename huggingface-cli to hf to favor transformers v5 by @vermouth1992 in #5145
- [worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode by @RobotGF in #5144
- [megatron] fix: patched_routing accepts arbitrary args by @HollowMan6 in #5155
- Revert "[worker,rollout] refactor: remove set_expandable_segments calls in vllm separation mode" by @vermouth1992 in #5156
- [model, algo] feat: implement SAC algorithm and support Pi0.5 model by @Miical in #5118
- [training_utils] fix: Resolved bugs and conflicts in the fully async caused by multiple PRs by @ZLiao097 in #5100
- [reward] feat: split reward loop manager and agent loop manager by @yyDing1 in #5134
- [Doc] feat: update README to add new awesome project
RuleReasonerby @jacklanda in #5157 - [trainer] feat: move save_ckpt before update_weights and validate by @RobotGF in #5137
- [ci,doc] feat: Add ascend_ci_guide by @yyyy2000 in #5163
- [rollout] fix: remove dtype cast by @vermouth1992 in #5117
- [hardware] fix: update architecture check and CANN toolkit path retrieval in device.py by @jianjunzhong in #5142
- [vllm] fix: build_app() missing 1 required positional argument: 'supported_tasks' by @HollowMan6 in #5093
- [perf] feat: clear megatron global buffer memory by @wuxibin89 in #5173
- [vllm, rollout] fix: Use different seeds for vllm by @victordion in #5179
- [ci] fix: fully async ci break by @yyDing1 in #5166
- [vllm] feat: make seed configurable and different among replicas by @vermouth1992 in #5181
- [data] fix: keyword video_metadata by @sophiayyya in #5177
- [megatron] fix: checkpoints uses
fully_reshardableby default when supported by @HollowMan6 in #5154 - [trtllm, rollout] test: add unittest by @hchings in #5102
- [reward] refactor: migrate all reward managers to the new asynchronous reward manager by @yyDing1 in #5189
- [vllm] fix: handle multimodal inputs correctly in full async mode by @Silas-11 in #5160
- [megatron] feat: fused kernel suppport for new model engine by @HollowMan6 in #5191
- [fsdp] feat: Merge lora in fsdp training to speed up rollout by @amzfang in #5115
- [megatron] Add Megatron-Bridge support in fully async policy by @eternally-z in #5196
- [perf] fix: infer server profiler args fix by @mengchengTang in #5121
- [doc, perf] feat: add perf_tuning_on_ascend by @tardis-key in #5104
- [ci] feat: add three npu workflow yml test by @daikang6 in #4978
- [vllm] fix: ignore MoE router layers for FP8 quantization by @zpqiu in #5107
- [worker, training_utils] fix: Metric Aggregation Across DP Ranks by @JacobHelwig in #5203
- [megatron] fix: add protections for logits_processor_args.pop("loss_mask"), which may cause the
forward_fnof value net collapse by @albertcity in #5204 - [trtllm] fix: reduce peak mem usage during update_weight() by @hchings in #5212
- [algo] feat: support rollout router replay in MegatronEngine by @xhx1022 in #5185
- [trtllm] fix: add synchronization before resume kv_cache to prevent oom in non-leader ranks by @shuyixiong in #5208
- [perf] feat: add images_seqlens on mfu calculation for engine_worker by @alwaysyiyu in #5207
- [reward] fix: Add assert to prevent reward NaN caused by overlong_cfg.len=0 by @ZLiao097 in #5216
- [recipe] refactor: refactor ray trainer for separate recipe use. (fully async / one step off) by @ArronHZG in #5184
- [BREAKING][reward] refactor: remove reward model worker code and invocation by @yyDing1 in #5194
- [fsdp] fix: Support trust_remote_code during FSDP HugginFace checkpoint save by @thvasilo in #5200
- [worker] feat: Avoid redundant base weight sync when engine doesn't sleep by @JohnConnor123 in #5147
- [ci] chore: fix npu ci by @wucong25 in #5218
- [vllm] fix: apply moe weight loader patch for standard wight loading by @zjchenn in #5234
- [ci] chore: fix npu ci setuptools by @yyyy2000 in #5238
- [ci] chore: fix npu ci setuptools, keep update pip and packaging by @yyyy2000 in #5239
- [reward] fix: preserve input non_tensor_batch in AgentLoopManager when reward_loop_worker_handles is None by @none0663 in #5195
- [perf] fix: fix npu profiling scripts by @tongtong0613 in #5226
- [megatron] feat: use yaml to manage mbridge args by @Kite0011 in #4584
- [algo] feat: reduce routed expert padding via NestedTensor and uint8 dtype by @xhx1022 in #5240
- [ray,trainer] feat: add master port range configuration for port range by @RobotGF in #5201
- [BREAKING][reward] refactor: deprecate batch reward manager by @yyDing1 in #5237
- [fsdp] feat: add script for qwen3next training on npu platform by @zjchenn in #5236
- [doc] fix: Update ascend_sglang_best_practices.rst by @hustmf in #5261
- [vllm, rollout] feat: update abort function with vllm internal pause_generation api by @PeterSH6 in #5253
- [veomni, trainer] feat: add rl support for veomni backend by @ji-huazhong in #4882
- [vllm] fix: run post-load weight processing once after async IPC sync by @zjchenn in #5235
- [doc] chore: version of dapo_multi_model_optimization_practice by @ChibiQuest in #5263
- [rollout] feat: make more rollout flags configurable to trtllm backend by @Superjomn in #5258
- [doc] refactor: update reward documents by @yyDing1 in #5272
- [doc] chore: Ascend retool practice doc by @LeoYao123 in #5266
- [vllm, rollout] fix: auto-downgrade cudagraph_mode to PIECEWISE when DCP is enabled by @Siritao in #5262
- [fsdp, veomni, trainer] fix: restrict npu-patch scope to avoid veomni backend interference by @ji-huazhong in #5268
- [BREAKING][reward] refactor: the full reward configuration by @yyDing1 in #5255
- [ci] chore: delete redundant npu ci by @yyyy2000 in #5259
- [fsdp, megatron] refactor: Refactor Fully Async Implementation via Engine Workers by @ZLiao097 in #5269
- [megatron, model] chore: add example of nemotron nano v3 by @ISEEKYAN in #5284
- [misc] chore: fix veomni_trainer.yaml by @wuxibin89 in #5285
- [megatron] fix: fallback to moe_router_padding_for_fp8 in router replay patch by @xhx1022 in #5283
- [reward] fix: backward compatibility with old reward config by @yyDing1 in #5287
- [reward] fix: reward model args and reward_kwargs bug by @yyDing1 in #5289
- [doc] chore: gspo update config and add version with npu by @chengminhua in #5279
- [fsdp,veomni] fix: remove FSDPUlyssesShardingManager to make eval_mode/train_mode reentrant by @wuxibin89 in #5305
- [veomni] refactor: Modify dp related parameters to align with FSDP backend and remove temporarily unsupported TP/PP/CP parameters by @ChengQianqian in #5303
- [trtllm] feat: use max utilization scheduler by default by @tongyuantongyu in #5302
- [worker, tool] fix: stabilize agent loop extra fields schema by @denismegerle in #5301
- [algo] feat: add NPU SAPO training script for Qwen3-8B (FSDP/vLLM backends) by @Vvictorrrr in #5257
- [fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-8B (FSDP/VLLM backends) by @zhihaofang1017 in #5250
- [fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends) by @alwaysyiyu in #5260
- [model,cfg] fix: type annotation for Lora target_modules by @thvasilo in #5223
- [megatron] feat: Support LoRA training with FP16 using Megatron-Bridge. by @xichengpro in #4648
- [ci] fix: main pre-commit by @pengwu22 in #5318
- [misc] refactor: delete remaining batch-mode code in single controller by @ji-huazhong in #5319
- [rollout] fix: make skip rollout compatible with async mode by @ChengQianqian in #5320
- [veomni, trainer] fix: padding pixel value with padding_scale for vl model by @A1waysBeenHere in #5322
- [fsdp,algo] feat: add NVFP4 QAT (Quantization-Aware Training) support by @zhangyimi in #5190
- [docs] Add new awesome work using Verl by @MING-ZCH in #5328
- [vllm] feat: remove workers from vLLMHttpServer by @tongyx361 in #5330
- Revert "[vllm] feat: remove workers from vLLMHttpServer" by @PeterSH6 in #5333
- [misc] refactor: remove deprecated codes by @ji-huazhong in #5336
- [misc] fix: include config files for experimental entrypoints in package data by @guillemgt in #5343
- [ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile by @ji-huazhong in #5345
- Revert "[ci] chore: set torch-npu to 2.7.1.post2 in ascend dockerfile" by @ji-huazhong in #5353
- [reward] fix: empty class_dict for standalone reward model resource pool by @yyDing1 in #5348
- [trainer] feat: Add Torchtitan as alternative training engine by @acisseJZhong in #5051
- [training_utils] fix: mask out-of-bounds vocab entries fused kernel LCE logsumexp by @EricMarcus-ai in #5349
- [rollout] fix: Include routed_experts in ToolAgentLoop return value to support R3 router replay by @mirrorboat in #5368
- [misc] fix: pass torch dtype when init random model by @HollowMan6 in #5370
- [ci] chore: pin version cupy-cuda12x==13.6.0 by @wuxibin89 in #5377
- [doc] chore: ascend add performance analysis guide and update some version info by @chengminhua in #5324
- [trainer] feat: Support RL trainer with TorchtitanEngine by @acisseJZhong in #5356
- [algo] feat: Exception for agg_loss when
dp_size > 1but global information is absent & fix: correct & consistent loss aggregation for "seq-mean-token-sum-norm" by @tongyx361 in #5366 - [rollout] fix: make
run_uvicornbehavior more reliable by @tongyuantongyu in #5383 - [doc] feat: update documentation for The Optimal Token Baseline and Rollout Correction by @jiawei415 in #5380
- [trainer] refactor: remove fsdp_sft_trainer.py by @wuxibin89 in #5382
- [ci] fix: occasional CI failures caused by sglang server port conflicts by @pengwu22 in #5310
- [fsdp] fix: add aggressive_empty_cache at end of init_model to prevent vLLM OOM by @EricMarcus-ai in #5384
- [doc, worker] feat: Enable Megatron-Bridge for MTP by @HollowMan6 in #5323
- [ckpt] feat: add kimi ckpt engine backend by @kip-cxj in #4954
- [misc] feat: ignore pyrightconfig.json to allow users to customize pyrightconfig to fix breaks by @tongyx361 in #5385
- [ci] chore: update triton-ascend and fix npu ut by @yyyy2000 in #5396
- [fsdp, megatron] feat: refactor fully-async and one-step-off training to support multiple checkpoint engine backends by @Shangwei-Li in #5029
- [doc] feat: add
fully asyncandone step offto PR Checklist by @ArronHZG in #5404 - [doc] chore: ascend update gspo optimization practice document by @chengminhua in #5408
- [algo] feat: add DPPO with binary TV or binary KL implementation by @QPHutu in #5397
- [doc] chore: npu best practice doc by @hustmf in #5415
- [algo] fix: seq mean and default scale factor
loss_mask.shape[-1]as in seq-mean-token-sum-norm by @tongyx361 in #5417 - [megatron] fix: missing model offload to CPU for forward_only mode by @xhx1022 in #5406
- [megatron] feat: enhance model offloading and loading for frozen parameters by @RobotGF in #5412
- [perf] fix: the overwritten of Torch_profile with multi steps. by @Rhetee in #5395
- [trainer] feat: add padding for tensor alignment in preprocess_thd_no_padding function by @RobotGF in #5410
- [tool] fix: handle empty image inputs in ToolAgentLoop by @denismegerle in #5420
- [rollout, data] fix: honor train_max_samples/val_max_samples in fully async rollouter by @denismegerle in #5359
- [tool] refactor: remove tool schema plumbing from SingleTurnAgentLoop by @denismegerle in #5425
- [misc] feat: Add code for data grouping in no-padding scenario by @Kite0011 in #5424
- [doc] add Dr. MAS to awesome work by @langfengQ in #5427
- [BREAKING][rollout,cfg] refactor: get rid of actor_rollout_ref config from rollout by @wuxibin89 in #5418
- [ci] chore: bump the version of vllm-ascend to v0.11.0 in the ascend dockerfile by @ji-huazhong in #5431
- [doc] chore: fix npu docs by @wucong25 in #5428
- [doc] fix: fix npu retool docs by @LeoYao123 in #5449
- [data] refactor: TransferQueue - retire legacy integration codes by @0oshowero0 in #5454
- [ci] fix: failed trtllm_unit_tests with attribute error by @HollowMan6 in #5446
- [megatron] fix: pass dp_group to rearrange_micro_batches to fix DeepEP timeout by @xhx1022 in #5451
- [rollout] fix: remove unexpected concurrency bound at 1000 by @tongyuantongyu in #5402
- [data] fix: accept jsonl dataset files by @zqzten in #5456
- [single_controller] refactor: use BatchData to simplify concat and chunk in single_controller by @zw0610 in #5450
- [megatron] feat: Support DSA indexer LoRA mappings by @HollowMan6 in #5462
- [doc] fix: fix typo in agentic rl doc by @KevinZeng08 in #5461
- [misc] chore: support transformers 5 by @HollowMan6 in #5445
- [doc] fix: fix dapo multi model practice by @ChibiQuest in #5453
- [trainer] feat: Update trainer API for TorchtitanEngine by @acisseJZhong in #5457
- [rollout] refactor: bucketed transfer utils by @pengwu22 in #5309
- [rollout] feat: update trtllm docker by @Superjomn in #5386
- [doc] fix: fix npu retool doc by @LeoYao123 in #5467
- [ckpt] feat: add mooncake backend by @x1314aq in #5176
- [doc] chore: add ascend backend feature by @wucong25 in #5466
- [megatron] fix: support hybrid dense/MoE models in router replay with PP/VPP by @xhx1022 in #5452
- [megatron] fix: patch support newer mcore version by @HollowMan6 in #5372
- [ci] fix: sanity issue related to Last updated string by @HollowMan6 in #5477
- [rollout] feat: support auto resume on abort in FullyAsyncLLMServerManager by @wuxibin89 in #5430
- [trainer] feat: Support EP with TorchtitanEngine by @acisseJZhong in #5469
- docs: fix typo in kl_penalty docstring by @ZHAOoops in #5481
- [megatron] fix: add FP8 block quantization padding for EngineWorker by @zpqiu in #5440
- [ckpt, model] fix: preserve lora_alpha in model_merger via training meta by @Yatogaii in #5326
- [fsdp,algo] feat: Support QAT (NVFP4) in FSDPEngine for the unified engine_workers architecture by @zhangyimi in #5411
- [doc] feat: add mtp spec log by @ArronHZG in #5491
- [reward] feat: add example scripts for reward model usage by @yyDing1 in #5486
- [BREAKING][trtllm] feat: Add FP8 refit support for trtllm rollout by @shuyixiong in #5374
- [veomni,ci] fix: Modify default setting in veomni test scripts to prevent misunderstanding by @0oshowero0 in #5484
- [ckpt] fix: test issues of kimi and mooncake backend by @x1314aq in #5500
- [doc] chore: update FP8 guide with E2E training section and reorganization by @zpqiu in #5502
- [model,doc] feat: add qwen3 32B megatron 1k to 256k by @ChibiQuest in #5497
- [doc] chore: npu docker support vllm013 by @yyyy2000 in #5471
- [doc] fix: update recipe link to fix 404 not found by @tardis-key in #5286
- [ci] feat: add npu nightly ci by @daikang6 in #5225
- [data] fix: use %-style format placeholders in logger.warning() by @cavities12 in #5512
- [rollout] feat: global request-level load balancer single source routing by @aoshen524 in #5399
- [rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in #5149
- Revert "[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout" by @wuxibin89 in #5525
- [ckpt] fix: Fix checkpoint engine backend unset error by @ZLiao097 in #5473
- [rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout by @SchumiDing in #5528
- [Megatron] feat: Support routing replay on NPU with performance and compatibility enhancements by @755651978 in #5298
- [rollout] fix: update checkpoint_engine bucket size parameter for Ascend compatibility by @nuerxiati in #5539
- [misc] feat: support dynamic bsz using group size by @Kite0011 in #5438
- [fully_async, one_step_off] feat: support auto resume on abort when using fully_async by @ArronHZG in #5487
- [doc] chore: add note for kimi ckpt engine by @kip-cxj in #5546
- [perf, trainer, training_utils] fix: Try to montior with mlflow up to 3 times, and avoid duplicate key processing in each step. by @sheilaliuxl in #5548
- [trainer] fix: support nsys when using sft_trainer_ray.py by @arvyanh in #5489
- [rollout] fix: reintroduce NCCL_CUMEM_ENABLE for weight synchronization in async rollout environments by @RobotGF in #5522
- [ci] feat: npu nightly ci log is redirected to the specified directory by @daikang6 in #5557
- [ci] fix: sft_trainer_ray ci break by @wuxibin89 in #5562
- [fsdp] fix: wrap embed_tokens/lm_head by name for peft models by @cavities12 in #5516
- [ci] chore: update npu ci to vllm013 by @yyyy2000 in #5523
- [algo] feat: support router replay in MegatronEngine by @xhx1022 in #5219
- [docker] feat: update stable image to vllm==0.17.0, sglang==0.5.9 by @Begunner in #5542
- [megatron, model] feat: qwen3.5 example by @ISEEKYAN in #5381
- [algo] feat: add GDPO (Group reward-Decoupled Normalization Policy Optimization) algorithm by @Rhetee in #5503
- [megatron] feat: model engine support mtp by @ArronHZG in #5561
- [doc] fix: fix te pip install instructions by @TKONIY in #5501
- [rollout] fix: agent loop copy read-only routed_experts before torch conversion by @HollowMan6 in #5519
- [ci] chore: change machine for npu ci by @yyyy2000 in #5578
- [megatron] fix: apply override_transformer_config inside mindspeed engine to avoid confict with other training engine by @ChengQianqian in #5589
- [rollout] fix: fix some compatibility issue with qwen vl seris support of trtllm rollout by @SchumiDing in #5583
- [misc] chore: bump version to 0.7.1 by @wuxibin89 in #5602
New Contributors
- @yyyy2000 made their first contribution in #4626
- @Junxiao-Zhao made their first contribution in #4798
- @cboss6 made their first contribution in #4810
- @wdl339 made their first contribution in #4799
- @zhihaofang1017 made their first contribution in #4753
- @Winston-Yuan made their first contribution in #4833
- @jiawei415 made their first contribution in #4678
- @XChen-Zero made their first contribution in #4842
- @khazic made their first contribution in #4854
- @jreiml made their first contribution in #4873
- @Prozac614 made their first contribution in #4900
- @hustmf made their first contribution in #4715
- @Tomsawyerhu made their first contribution in #4910
- @xhx1022 made their first contribution in #4934
- @walterchenchn made their first contribution in #4902
- @rfy48 made their first contribution in #4943
- @bjf-frz made their first contribution in #4888
- @JohnConnor123 made their first contribution in #4967
- @aphrodite1028 made their first contribution in #4970
- @AniZpZ made their first contribution in #4191
- @joyang-nv made their first contribution in #4665
- @Wheeeeeeeeels made their first contribution in #4991
- @letsgetai made their first contribution in #4998
- @moaead made their first contribution in #5002
- @hanhan-networking made their first contribution in #4885
- @Li-Yongwen made their first contribution in #4938
- @jianjunzhong made their first contribution in #4280
- @hchings made their first contribution in #5032
- @bithighrr made their first contribution in #5025
- @ChibiQuest made their first contribution in #5044
- @xiazhahe made their first contribution in #5065
- @SchumiDing made their first contribution in #5085
- @psyloy made their first contribution in #4984
- @liangxuZhang made their first contribution in #5108
- @arvyanh made their first contribution in #4981
- @albertcity made their first contribution in #5110
- @yuleiqin made their first contribution in #5124
- @eternally-z made their first contribution in #5122
- @Miical made their first contribution in #5118
- @jacklanda made their first contribution in #5157
- @victordion made their first contribution in #5179
- @sophiayyya made their first contribution in #5177
- @Silas-11 made their first contribution in #5160
- @amzfang made their first contribution in #5115
- @daikang6 made their first contribution in #4978
- @shuyixiong made their first contribution in #5208
- @alwaysyiyu made their first contribution in #5207
- @thvasilo made their first contribution in #5200
- @Superjomn made their first contribution in #5258
- @LeoYao123 made their first contribution in #5266
- @Siritao made their first contribution in #5262
- @ChengQianqian made their first contribution in #5303
- @tongyuantongyu made their first contribution in #5302
- @denismegerle made their first contribution in #5301
- @Vvictorrrr made their first contribution in #5257
- @zhangyimi made their first contribution in #5190
- @MING-ZCH made their first contribution in #5328
- @guillemgt made their first contribution in #5343
- @acisseJZhong made their first contribution in #5051
- @mirrorboat made their first contribution in #5368
- @kip-cxj made their first contribution in #4954
- @QPHutu made their first contribution in #5397
- @Rhetee made their first contribution in #5395
- @zqzten made their first contribution in #5456
- @KevinZeng08 made their first contribution in #5461
- @x1314aq made their first contribution in #5176
- @ZHAOoops made their first contribution in #5481
- @Yatogaii made their first contribution in #5326
- @cavities12 made their first contribution in #5512
- @755651978 made their first contribution in #5298
- @sheilaliuxl made their first contribution in #5548
- @TKONIY made their first contribution in #5501
Full Changelog: v0.7.0...v0.7.1