Highlights
Training
Megatron
- Megatron-FSDP mode for the Megatron backend (#5423).
- Dynamic context parallel (#5057) and CP for BSHD format (#5826).
- Checkpoint save as HF PEFT format (#5575).
- Qwen3.5 MTP SFT/RL (#5898)
- NVFP4 (W4A16) QAT training via ModelOpt (#5254) with QAT documentation (#5861).
VeOmni
- VeOmni engine for on-policy distillation (#6072) with native critic support (#6453) and native
return_log_probspath (#6184). - MoE router replay (R2/R3) support (#6325) and MoE load-balance monitoring (#6470).
- Qwen3.5 SP, GRPO demos for Qwen3-VL-30B-MOE / Qwen3.5-122b-a10b / Qwen3.5-30b (#6061, #5275, #6264, #6323).
Hardware / NPU
- NPU support for Liger-Kernel (#6244).
- Expandable segments (#5795, #6346).
- MXFP8 rollout on Ascend 950 (#5756).
Rollout
vLLM
- Split large weights into chunks in NCCL/NIXL checkpoint engine (#6091).
- MooncakeStoreConnector with hard-reset on weight update (#6373).
- Upgrade stable version to vllm==0.20.2 (#6393).
SGLang
- LoRA support for SGLang rollouts (merge + native adapter paths) (#5564).
- SGLang Prefill-Decode disaggregated rollout (#6117).
- Upgrade stable version to sglang==0.5.12 (#6435).
TensorRT-LLM
On-Policy Distillation (OPD)
- End-to-end on-policy distillation support across FSDP, Megatron and VeOmni backends (#5041).
- Text and multimodal, single/multiple teachers (#6051), sync (#5997) and fully-async (#6056) trainers.
- Reverse KL loss and forward top-k KL loss, with a fused top-K distillation kernel (#6511) and overlap metrics (#6469).
- Docs: https://verl.readthedocs.io/en/latest/algo/opd.html
Trainer
Sync Trainer
- New sync trainer with TransferQueue to decouple control flow and data flow in the single controller (#5401).
- Up to 2x speedup in multimodal training with 128 GPUs, see TransferQueue blog post.
- Support multiple trajectories output for each agent loop.
- TBD: Fully async trainer with TransferQueue will be in next release.
Fully Async Trainer
- Reuse trainer worker group for hybrid rollout during validation (#6076).
- Online policy distillation in fully async training (#6056).
- Qwen3-VL-30B-A3B and Qwen3-VL-8B fully async GRPO training scripts (#6131, #6006).
Agentic RL Training
- New project verl-project/uni-agent for building, running, and training general agents at scale.
- Support coding, search, GUI agent RL training.
Tools & Reward
- Simpler function-based tool registration (#6189) with per-sample tool environment routing (#5978) and Gemma4 tool parser (#6406).
- Multi-step support in skip_rollout v2 (#5556) and improved error messages for malformed tool calls (#6055).
Breaking Changes
- Deprecate legacy FSDP and Megatron workers, migrate to the unified engine abstraction (#5604, #6067).
- Deprecate
verl/interactions(#6074). - Move
LLMServerManagerout ofAgentLoopManager(#6129). - Migrate Diffusion RL stack to
verl-omni(#6200) andexperimental/vlato a standaloneverl-vlarepository (#6162). - Remove curriculum sampler + dynamic dataset + tool examples (#6302).
main_ppo.pyis deprecated with a warning in favor ofmain_ppo_sync.py(#6384).
What's Changed
- [ci] feat: add profiling tests to vLLM ci by @Gary-cjy in #5215
- [config] fix: sync chat_template from tokenizer to processor for multimodal base models (e.g. Qwen3.5) by @khazic in #5612
- [rollout] fix: DP > 1 hang with vllm rollout when training dense or MoE(EP=1) model by @NascentAscension in #5609
- [doc] refactor: rearrange ascend doc by @hustmf in #5620
- [Megatron] feat: Support compatibility enhancements of vp_stage by @ChibiQuest in #5580
- [model] chore: Fix Qwen3-235B precision issues on NPU by @autbuster in #5610
- [data, doc, misc] fix: fix outdated config keys in example scripts and docs by @AnikiFan in #5550
- [ci] chore: add npu one step off megatron ci and fix fsdp ci by @wucong25 in #5510
- [single_controller] fix: correct spelling error 'procecss' -> 'process' by @Fan-Yunfan in #5464
- [ci] fix: full async npu ci test by @ETOgaosion in #5647
- [doc] fix: modify ascend_quick_start for submodule recipe by @xiazhahe in #5642
- [algo, trainer] fix: pass missing old_log_probs to OTB estimators by @dubin555 in #5615
- [ci] fix: mcore deepseek CI test by @ETOgaosion in #5658
- [algo] feat: SAC Performance Improvements for Pi0.5 by @Miical in #5645
- [data] feat: optimized text filter process speed on transformer>=5.3.0 and run qw3.5 + aime data by @chenjiaoAngel in #5632
- [vllm] chore: fix mc2 used in vllm_ascend on A2 npu by @wucong25 in #5560
- [vllm] fix: npu disable flash_attn for RotaryEmbedding by @Mind-s in #5640
- [trainer] feat: Add Nemo-Automodel as alternative training engine by @HuiyingLi in #5407
- [fully_async] fix: fully_async ckpt save bug by @sl-1314 in #5677
- [megatron] fix: add megatron checkpoint patch by @Begunner in #5251
- [sglang,fsdp] feat: LoRA support for SGLang rollouts (merge + native adapter paths) by @cavities12 in #5564
- [1/n][vllm, rollout] feat: flowgrpo - support vllm-omni as rollout backend for verl by @knlnguyen1802 in #5616
- [training_utils] feat: use flash_attn cross_entropy loss in FusedLinearForPPO by @Luosuu in #5662
- [rollout,trtllm] fix: trtllm multinode rollout by @hchings in #5693
- [fsdp,megatron,vllm,trainer,algo] feat: On-Policy Distillation by @JacobHelwig in #5041
- [trainer] fix: fixed an issue where mindspeed's backend context parallelism feature functioned incorrectly by @ji-huazhong in #5697
- [megatron] fix: remove llama and qwen2 files by @ChengQianqian in #5707
- [trtllm,rollout] fix hang issue from VLM codepath by @hchings in #5701
- [vllm] fix: fp8 utils with vllm15 for moe model by @sophiayyya in #5661
- [one_step_off] fix: fix one-step-off update weights before rollout finished by @wucong25 in #5698
- [megatron, ckpt] fix: set dist_ckpt_optim_fully_reshardable default to False by @koanho in #5705
- [fsdp, perf, doc] fix: fix Liger integration for VL models and RL training, allowing liger speed improvement by @EricMarcus-ai in #5669
- [perf, trainer, training_utils, ray, worker] fix: Add set_numa_affinity() for engine workers: TrainingWorker. by @sheilaliuxl in #5627
- [megatron, vllm] feat: NVFP4 (W4A16) QAT training support via ModelOpt by @jQizhang in #5254
- [training_utils] fix: use response_lens.max() instead of offsets().max() for nested tensor max_response_len by @dubin555 in #5699
- [fsdp] fix: avoid NestedTensor jagged dim ambiguity for 3D position_ids by @Solus-sano in #5689
- [ci] fix: fix various ci failure by @wuxibin89 in #5717
- [rollout] fix: enable FP8 quantization for SGLang rollout in fully async mode. by @eternally-z in #5675
- [vllm] feat: Add support for the Qwen3_5MoeForCausalLM model On Ascend by @mikequan0425 in #5652
- [trainer] fix: skip dataloader state restore when resuming at epoch boundary by @yyZhangAI in #5725
- [algo] feat: Implement IcePop in rollout correction by @HollowMan6 in #5722
- [fully_async] chore: Add fully async dapo qwen3-30b npu script by @wangshuyang31 in #5653
- [model] fix: An end-to-end script for the 235b model is provided for the 256k long sequence by @autbuster in #5733
- [model] chore: Corrected the description of errors related to the 235b script and fixed the error in running the sft script. by @autbuster in #5732
- [misc] fix: make the assert user-friendly for
get_tensordictby @stas00 in #5735 - [ci] fix: fix circular import in ci by @vermouth1992 in #5736
- [1/2][rollout,trainer] refactor: Teacher colocate mode -- Move teacher logprob computation to
AsyncTeacherLLMServerManagerby @JacobHelwig in #5723 - [misc] fix: supplement the dependencies that are missing in the requirements-npu.txt by @nuerxiati in #5740
- [ckpt] fix: handle string task_type in LoRA model merger by @FrankHo-Hwc in #5742
- [ci] chore: delete install current repository for npu ci by @yyyy2000 in #5748
- [BREAKING][trainer] feat: deprecate legacy engine fsdp and megatron workers by @wuxibin89 in #5604
- [2/2][rollout,trainer] feat: Teacher colocate mode by @JacobHelwig in #5745
- [trtllm, rollout] fix: partial loading logic by @hchings in #5728
- [trainer] fix: convert numpy types to native Python types in MultiTurnSFTDataset by @khazic in #5743
- [3/n][reward] feat: flowgrpo - support image-based rewards (rule-based & genrm) by @chenyingshu in #5713
- [ci] chore: delete mirror for npu ci by @yyyy2000 in #5758
- [fsdp] fix: pass dp_group to prepare_dynamic_batch to fix CUDA deadlock by @JenniferWang in #5591
- [megatron] feat: checkpoint save as HF PEFT format by @HollowMan6 in #5575
- [doc] refactor: add constraints on the use of vpp and mbridge parameters by @zjchenn in #5763
- [fully_async] fix: Patch vllm013 weight loader for qwen3-moe series by @wangshuyang31 in #5695
- [hardware, rollout] feat: enable MXFP8 rollout on Ascend 950 devices (DV100 & DV120) by @zhijie-os in #5756
- [rollout, tool] feat: support multi-step in skip_rollout v2 by @zyang6 in #5556
- [trainer] fix: MLFlow publishing metrics failure should be non-blocking. by @sheilaliuxl in #5771
- [ci] chore: add npu nightly ci for dapo-moonlight-16b-megatron and modify log path by @beirong8kmiles in #5734
- [trainer] feat: support use_remove_padding=False for mindspeed backend by @ji-huazhong in #5768
- [ci] fix: resolve oom when allocating weight transfer buffer in fully async test cases by @0lynnlin0 in #5791
- [fsdp, model] feat: add qwen3.5 fsdp grpo training support. by @Zhang1Sheng in #5682
- [2/n][rollout] feat: flowgrpo - add diffusion agent loop support by @AndyZhou952 in #5716
- [trainer] feat: enable expandable segment support for npu by @ji-huazhong in #5795
- [ci] feat: support Ascend A2/A3 docker image build pipeline for sglang by @xiazhahe in #5804
- [tool] chore: remove hard-code tool agent loop in fully async by @yyDing1 in #5816
- [megatron] feat: support dynamic CP by @ISEEKYAN in #5057
- [sglang, rollout] fix: wire up LoRA adapter path for engine_workers + sglang sleep by @cavities12 in #5769
- [env] fix: Modify the package installation sequence in the Ascend installation guide by @nuerxiati in #5819
- [rollout] fix: processor does not have image_processor. by @SanftMonster in #5823
- [single_controller] fix: Set
device_nameforsplit_resource_poolto prevent failure on NPU environments by @0oshowero0 in #5824 - [megatron] fix: pass use_distributed_optimizer to ddp_config in vanilla mbridge path by @khazic in #5775
- [reward] fix: disable signal.alarm() in math_verify to fix silent scoring failure in Ray workers by @farazkh80 in #5635
- [megatron] feat: support cp for bshd format by @wuxibin89 in #5826
- [megatron, fsdp] feat: DP workload balance for SFT by @arvyanh in #5679
- [doc] chore: add news for PyTorch Conference Europe 2026 by @HollowMan6 in #5847
- [doc] chore: update README.md by @wuxibin89 in #5850
- [misc] feat: add agent instructions, skills & improve CI for easier tests by @tongyx361 in #5846
- [rollout, trtllm] fix: add missing init.py to trtllm_rollout package by @Superjomn in #5857
- [misc] fix: license for verl/workers/rollout/trtllm_rollout/init.py by @tongyx361 in #5862
- [vllm] fix: Fix vLLM synchronization error caused by SGLang skipping resume optimize by @ZLiao097 in #5866
- [cfg] refactor: unify ppo_trainer and ppo_megatron_trainer config by @wuxibin89 in #5848
- [fully_async] chore: Update fully async dapo qwen3-30b npu script by @wangshuyang31 in #5864
- [doc] chore: add npu faq doc by @hustmf in #5871
- [docker, ci] fix: all CIs, transformers upgrade to 5.3.0 and vllm==0.18.0 by @ETOgaosion in #5724
- [doc] feat: add NVFP4 QAT documentation by @zhangyimi in #5861
- [megatron, cfg] feat: add Qwen3.5-122B Megatron launch script by @none0663 in #5874
- [tool] feat: verl integrate msprobe data collection by @Tjh-UKN in #5186
- [ci] fix: rename fsdp-vlm to megatron-vlm in trtllm cleanup needs by @Superjomn in #5880
- [megatron] fix: support critic model by @wuxibin89 in #5870
- [trainer] fix: handle empty response_mask in calculate_debug_metrics by @Jackie2049 in #5860
- [4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support by @zhtmike in #5802
- [ci] fix: fix machine label for nightly_ascend.yml by @yyyy2000 in #5887
- [cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig by @yifannnwu in #5885
- [megatron] fix: enable_routing_replay fails with MLATransformerConfig… by @NoonePauseferg in #5884
- [model] fix: replace inplace += with out-of-place addition in dummy visual forward by @reonokiy in #5881
- [megatron] fix: ValueError when unpacking preprocess_thd_engine result in router replay by @guillemgt in #5891
- [trainer] feat: add mindspeedllm backend engine support on NPU. by @pengnuoheng in #5680
- [ci, trtllm] test: speed up trtllm CI by using smaller models and reducing test parameters by @shikicloud in #5856
- [reward] fix: restore timeout in math_verify via ProcessPoolExecutor by @MaxwellJryao in #5839
- [ckpt, trainer] feat: Add plugin hooks for custom CheckpointEngineManager and CheckpointEngine by @NaomiEisen in #5718
- [doc] chore: Bug fixes for the qwen3-235b model in 256k scenarios by @autbuster in #5908
- [ckpt] fix: load custom_backend_module in CheckpointEngineManager on driver by @yangspirit in #5911
- [doc] fix: fix non‑compliant sections by @fh188 in #5913
- [megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU by @ZLiao097 in #5904
- [rollout] chore: bump up trtllm image version to 1.3.0rc10 by @Superjomn in #5841
- [trainer,perf] fix: enable profiler for SFT trainer by @wuxibin89 in #5909
- [misc] feat: Update file logger path output to absolute path by @vermouth1992 in #5924
- [training_utils, hardware] refactor: standardize deterministic environment variables for NCCL and NPU by @xuy1234 in #5923
- [ci] chore: add vllm_ascend.yaml by @Annarine in #5759
- Revert "[megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU" by @wuxibin89 in #5942
- [vllm] fix: remove redudant clone in weight refit by @wuxibin89 in #5934
- [ci] chore: add nightly npu docker for v0.7.1 by @yyyy2000 in #5930
- [sglang] fix: Adapting the use of _launch_subprocesses to the latest SGLang branch by @xiazhahe in #5868
- [megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU by @ZLiao097 in #5945
- [sglang] fix: sglang empty result problem by @Begunner in #5936
- [docker] feat: Add GB200 (aarch64/Blackwell) Docker image and training example by @kaixih in #5596
- [trainer] feat: add new trainer with TranferQueue by @wuxibin89 in #5401
- [megatron] fix: MTP loss deadlock when using context parallelism by @xhx1022 in #5895
- [ci] fix: indentation error in one step off policy e2e ci by @HollowMan6 in #5960
- [ci] chore: Update ascend related files code owner by @FightingZhen in #5982
- [fully_async]fix: terminated training when streaming_generation raise exception by @Zhikaiiii in #5977
- [rollout] fix: prevent engine_kwargs from overwriting KvCacheConfig in trtllm rollout by @Superjomn in #5939
- [ci] chore: Add veomni npu ci test by @wangshuyang31 in #5935
- [doc] chore: add rloo advantage estimator example script for npu by @zjchenn in #5950
- [trainer] fix: return NaN for empty tensors in compute_data_metrics by @Jackie2049 in #5899
- [reward] feat: add compute_score timing metrics to agent loop by @Stonesjtu in #5971
- [fully_async] feat: enable fully async to log_val_generations by @Begunner in #5988
- [ci, vllm] chore: update vllm-omni 0.18.0 official release and Miscellaneous by @AndyZhou952 in #5809
- [doc] fix: move low precision doc by @sophiayyya in #5994
- [rollout, vllm] fix: auto-convert disable_mm_preprocessor_cache to mm_processor_cache_gb for vllm >= 0.13.0 by @Silas-11 in #5961
- [fsdp] feat: qwen3.5 add npu docker file by @ruanhao566 in #5991
- [perf] feat: simplify precision_debugger config behavior and docs by @Tjh-UKN in #5986
- [doc] feat: move msprobe to ascend_tutorial by @tardis-key in #6004
- [misc, fully_async] feat: add Qwen3-VL-8B fully async GRPO training script on geo3k by @Silas-11 in #6006
- [megatron] fix: update patch for MLA flashattn forward by @HollowMan6 in #6005
- [veomni] feat: bump veomni to v0.1.8 by @deerlu in #5900
- [megatron] fix: add missing FP8 padding for router replay by @eternally-z in #5989
- [megatron, trainer] fix: respect calculate_entropy config in megatron actor update by @MaxwellJryao in #6016
- [ci] chore: add sglang new version docker for NPU by @xiazhahe in #6021
- [data] fix: pad data in preprocess_packed_seqs if shorter than align_size by @beirong8kmiles in #6001
- [tool, rollout, cfg] feat: per-sample tool environment routing for ToolAgentLoop by @pull-ups in #5978
- [fsdp] feat: qwen3.5 modify npu docker file based on CANN 8.5.2 by @ruanhao566 in #6017
- [ci] fix: update docker-build-ascend-a3-qwen3_5 by @yyyy2000 in #6022
- [fully_async] fix: add fully async grpo qwen3-235b npu script in main branch by @wangshuyang31 in #6012
- [veomni] feat: add DeepSeek-V3 to MOE_PARAM_HANDERS by @Luosuu in #5996
- [ci] chore: qwen3.5 add docker file add x86 CANN8.5.2 by @ruanhao566 in #6031
- [misc] chore: remove deprecated requirements.txt by @wuxibin89 in #6032
- [data, trainer] fix: batch padding for multi-trajectory by @ZhentaoFan in #5969
- [fully_async] fix: replace routed_experts on partial rollout resume i… by @NoonePauseferg in #6029
- [veomni] fix: use local paths for VeOmni model loading by @Luosuu in #6034
- [trainer,algo] feat: Support On-Policy Distillation in
main_ppo_syncby @0oshowero0 in #5997 - [trainer] fix: add missing rollout dump and corrected validation logging in main_ppo_sync by @guillemgt in #6024
- [5/n][trainer] feat: flowgrpo trainer by @zhtmike in #5951
- [trainer, rollout, algo] refactor: Remove OPD colocate mode by @JacobHelwig in #6039
- [rollout] fix: RM sleep/wake teacher replicas by @JacobHelwig in #6041
- [fully_async] fix: preserve per-iteration routed_experts on partial rollout resume by @NoonePauseferg in #6046
- [misc] fix: project name refactor - volcengine -> verl by @ETOgaosion in #6053
- [veomni] feat: support Qwen3.5 SP and add GRPO trainer demo using VeOmniEngine by @deerlu in #6061
- [rollout] chore: single turn agent loop also enable rollout trace as tool loop by @pengwu22 in #6048
- [trainer,cfg,rollout,algo] feat: Multi-Teacher OPD by @JacobHelwig in #6051
- [fully_async] fix: avoid blocking ray.get inside async actor methods by @yxs in #6052
- [rollout] feat: add inter-node TRT-LLM rollout support for trtllm by @Superjomn in #5992
- [fully_async] fix: Add Mindspeed Patch for Async Training on Ascend NPUs by @HwCARI in #5967
- [fully_async, rollout, trainer, tool, cfg] fix: ROCm async training compatibility for AMD MI300X by @xiaohong42 in #6062
- [algo] fix: strip '+' suffix in kl_penalty so k3+/low_var_kl+ work by @MaxwellJryao in #6058
- [recipe, cfg] fix: update NPU script parameters for Qwen3 GSPO and DAPO math recipes by @zjchenn in #6066
- [fully_async] fix: fix rollouter/idle compute in async-mode by @chenjiaoAngel in #6069
- [BREAKING] [misc] refactor: deprecate workers, migrate to engines by @ETOgaosion in #6067
- [BREAKING] [env] refactor: deprecate verl/interactions by @ETOgaosion in #6074
- [veomni] feat: enable VeOmni engine for on-policy distillation by @hjshi84 in #6072
- docs: fix typo Hellasage -> HellaSwag by @nowang6 in #6084
- [fully_async] Fix: fix fully async profiler for first step. by @Shangwei-Li in #6070
- [fully_async] Fix: Remove _fit_torch_memory for separation trainer. by @Shangwei-Li in #6075
- [fsdp, perf] fix: skip redundant to(cuda) and gc.collect in train_mode when offload is disabled by @evmanz in #5753
- [ci] chore: change some npu ci test yml machine by @yyyy2000 in #6043
- [fully_async] fix: allow drain loop to resume early on parameter sync when partial_rollout is enabled by @Begunner in #6090
- [fsdp] feat: Qwen3.5 Adds Docker Files Based on CANN8.5.2 A2 by @ruanhao566 in #6098
- [trainer] fix: include uid and sort by uid in validation generation dumps in main_ppo_sync by @guillemgt in #6101
- [ci] chore: update npu docker build pipeline by @yyyy2000 in #6102
- [ci] fix: remove spmd test by @tardis-key in #6103
- [docker] chore: update npu v0.7.1 docker pipeline by @yyyy2000 in #6106
- [fully_async] Fix: fix megatron save and offload in case param_offload is on by @Shangwei-Li in #6095
- [sglang] feat: Patch sglang to support on-policy distillation teacher by @mingruimingrui in #6120
- [sglang] feat: restrict abort and resume requests to primary server only by @AkiRusProd in #6109
- [trainer] fix: preserve jagged tensor layout when rebuilding nested tensors with same sequence length by @huaiyizhao in #6127
- [rollout] feat: improve error messages for malformed tool calls by @xiefan46 in #6055
- [model] feat: support qwen35 mtp sft/rl by @zpltys in #5898
- [rollout] fix: truncate routed_experts to response_length to match truncated response_ids by @armorbreak001 in #6089
- [fully_async, reward] feat: enable GenRM/DisRM support in fully async training by @xiefan46 in #6044
- [trainer] fix: support non-last ragged dim in nested tensor rebuild by @xiefan46 in #6149
- [repo] refactor: move experimental/vla to standalone verl-vla repository by @Miical in #6162
- [doc] fix: update dapo multi model optimization practice by @ChibiQuest in #6161
- [algo, fsdp, megatron, cfg] fix: wire up sum_pi_squared for optimal_token_baseline by @startju in #6153
- [fully_async] fix: fix custom reward function register on async mode by @chenjiaoAngel in #6166
- [trainer] feat: extend CheckpointEngineManager and AgentLoopManager hooks to separation and one-step-off trainers by @NaomiEisen in #6173
- [vllm] chore: fix sleep_level in Ascend by @wucong25 in #6170
- [trainer] fix: convert numpy arrays to native types before dumping rollout JSONL by @nev8rz in #6167
- [doc] fix: update dapo multi model optimization practice by @ChibiQuest in #6169
- [doc] chore: fix ascend qucik start by @wucong25 in #6174
- [rollout,vllm] feat: split large weight into chunks in NCCL/NIXL checkpoint engine by @wuxibin89 in #6091
- [ci] fix: engine_mindspeed_llm_rl_job switch to A3. by @pengnuoheng in #6020
- [ci] chore: add sgl_ascend cu for NPU by @Annarine in #6085
- [doc] chore: Add DART-GUI project to README by @Pengxiang-Li in #6192
- [worker] fix: optimzers doesn't have to be on same device as model by @HollowMan6 in #6196
- [trainer] fix: remove actor.dump_memory_snapshot by @tardis-key in #6198
- [worker] fix: grad_norm as non-tensor for metrics by @HollowMan6 in #6195
- [BREAKING][misc] refactor: migrate Diffusion RL stack to verl-omni by @SamitHuang in #6200
- [fsdp] fix: honor mixed_precision.param_dtype in forward_step autocast (#5932) by @shivam2199 in #6150
- [BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager by @wuxibin89 in #6129
- [ci] chore: bump trtllm to 1.3.0rc13 and verl to v0.7.1 by @Superjomn in #6215
- [misc] refactor: re-format examples and deprecate old examples by @ETOgaosion in #6126
- [megatron] fix: avoid 2x peak host memory on Megatron model offload by @acmore in #6193
- [misc] fix: add missing init.py files to package directories by @guillemgt in #5209
- [veomni] feat: use VeOmni's native return_log_probs path to compute log_probs by @Luosuu in #6184
- [reward] refactor: refactor RM score assembly in reward loop by @zhtmike in #6242
- [tool] fix: In the memory snapshot collection logic, opening history records is not compatible with NPU by @shaanjiangcun in #6216
- [megatron] fix: fix seq_len pad len, and adapt to new mtp_loss api (for megatron dev brance) by @zpltys in #6206
- [ci] chore: remove fastmcp deps for sglang ci by @wuxibin89 in #6249
- [trainer] fix: dump all outputs in validation in main_ppo_sync by @guillemgt in #6227
- [rollout,vllm] fix: include Ray job id in colocated weight-transfer IPC path by @timothygao8710 in #6246
- [trainer] fix: update TorchTitanEngine for latest torchtitan API by @acisseJZhong in #6231
- [reward] fix: compute correct rollout world size by @guillemgt in #6226
- [fully_async, rollout] feat: enable online policy distillation in fully async training by @xiefan46 in #6056
- [sglang] feat: SGLang Prefill-Decode disaggregated rollout by @yxs in #6117
- [rollout] fix: guard sglang profiling when self.tokenizer_manager is None by @LeiDing191 in #6217
- [tool] feat: Memory snapshot collection, add functionality to clear history after collection. by @shaanjiangcun in #6248
- [doc] add RandOpt to readme awesome work by @sunrainyg in #6257
- [rollout] fix: trtllm rollout docker image and a few scripts by @hchings in #6230
- Revert "[reward] fix: compute correct rollout world size" by @wuxibin89 in #6258
- [doc] feat: add code reviewer by @ArronHZG in #6260
- [trainer] fix: write request_id to reward_extra_infos_to_dump instead of reward_extra_infos_dict by @boundless-future in #6251
- [ci] fix: remove the rebundant config by @yyyy2000 in #6253
- [rollout] feat: enable Async RL for trtllm rollout by @hchings in #5631
- [ci] chore: bump trtllm to 1.3.0rc14 and pin mbridge by @Superjomn in #6262
- [veomni] feat: add Qwen3.5-122b-a10b GRPO trainer demo with EP enabled using VeOmniEngine by @deerlu in #6264
- [docker] feat: bump aarch64 vllm 0.17->0.18 by @kaixih in #6222
- [tool] feat: simpler function-based tool registration by @Begunner in #6189
- [doc] fix: router_replay is now under megatron by @HollowMan6 in #6272
- [reward, cfg] fix: correctly use RewardModelConfig in reward config by @guillemgt in #6265
- [ci] chore: slim down TRT-LLM CI by @Superjomn in #6275
- [doc] fix: correct module path in fully_async_policy documentation by @GJWu-zyx in #6287
- [megatron] fix: fix bugs when using position_ids in cp by @Kite0011 in #6267
- [ci] feat: add gspo qwen3-30b in nightly npu ci by @yyyy2000 in #6273
- [rollout] fix: skip_tokenizer_init=True for OPD teacher by @wuxibin89 in #6296
- [tool] refactor: tools will be initialized in AgentLoopWorker by @Begunner in #6300
- [reward, trainer] feat: support multi-output trajectories in async reward scoring by @guillemgt in #6228
- [BREAKING][tool, data] refactor: remove curriculum sampler + dynamic dataset + tool examples by @Begunner in #6302
- [model] chore: refactor npu scripts by @wucong25 in #6285
- [doc] chore: update vllm and vllm ascend from 0.13.0 to 0.18.0 in docs and dockerfile by @wangshuyang31 in #6291
- [megatron] fix: the NPU error that occurs after migrating from megatron worker to engine worker. by @xiazhahe in #6135
- [trainer] fix: combine REMAX sampled and greedy samples in one rollout request by @liziniu in #6308
- [vllm] refactor: MXFP8 support for ascend NPU by @quancs in #6307
- [megatron] feat: support Megatron-FSDP mode for Megatron backend by @conver334 in #5423
- [ci] chore: bump trtllm CI image to 1.3.0rc14 by @Superjomn in #6269
- [fully_async] feat: reuse trainer worker group for hybrid rollout to do validation by @ArronHZG in #6076
- [misc] refactor: re-format npu examples by @beirong8kmiles in #6286
- [ci] chore: add sglang ci for NPU by @xiazhahe in #6015
- [worker] feat: support log memory in engine worker by @yyyy2000 in #6270
- [doc] refactor: ascend doc refactor of precision guide and dockerfile build guidance by @yyyy2000 in #6298
- [data] fix: forward apply_chat_template_kwargs to system prompt measurement by @MohammadShahdad in #6305
- [fsdp] fix: FSDP2 silently drops fsdp_config.forward_prefetch by @memset0 in #6317
- [megatron] fix: fix bug with mcore0.12.1 + torch2.9.0 by @yyyy2000 in #6322
- [data, rollout] feat: add audio data support by @SanftMonster in #6276
- [doc] refactor: Ascend docs rectification, add parameter and metrics descriptions by @nuerxiati in #6294
- [doc] refactor: Ascend docs rectification, add new FAQ questions by @nuerxiati in #6328
- [model, fsdp] fix: Fix modeling_qwen2_5_vl missing attribute 'Qwen2RMSNorm' by @ZhuYajun-AI in #5901
- [fsdp, fully_async] feat: add Qwen3-VL-30B-A3B fully async GRPO training script on geo3k by @zhihaofang1017 in #6131
- [fsdp, fully_async] fix: fix CI import fast_pos_embed_interpolate in Qwen3-VL by @zhihaofang1017 in #6332
- [docker,vllm] feat: Enable DeepEP in ARM stable image by @kaixih in #6326
- [misc] chore: Update for vexact release by @pengwu22 in #6336
- [model, fsdp] fix: honor SP-rolled labels in fused kernels (#6068) by @shivam2199 in #6268
- [fsdp, ckpt] fix: drop tied target keys before HF save_pretrained by @ChangyiYang in #6334
- [fsdp] fix: lenient resolution of _no_split_modules in get_fsdp_wrap_policy by @SteadfastAsArt in #6290
- [doc] chore: split install guidance and quickstart by @Mengyuyang in #6337
- [trainer] feat: support ReMax in synchronous TransferQueue trainer by @liziniu in #6340
- [doc] chore: add npu advanced features by @wucong25 in #6339
- [doc] refactor: added the document that collects statistics on models and algorithms that support the NPU by @zhouhengan1211 in #6347
- [ci] fix: change model_path to local cache dir by @wuxibin89 in #6351
- [fsdp] fix: build no-padding attention mask from input ids by @anzhsoft in #6345
- [fsdp] fix: emit distillation outputs in use_remove_padding=False path (#6293) by @abinggo in #6350
- [tool] fix: tool response truncate side by @haoyang9804 in #6313
- [algo] fix: vectorized grpo low-variance scaling by @haoyang9804 in #6348
- [doc] chore: verl Ascend doc refactor by @hustmf in #6353
- [doc] chore: NPU model migration guidance by @Mind-s in #6330
- [doc] chore: fix ascend doc link by @hustmf in #6359
- Revert "[fsdp] fix: emit distillation outputs in use_remove_padding=False path (#6293)" by @wuxibin89 in #6360
- [fsdp, ckpt] chore: fold drop_tied_target_keys into top-level import by @ChangyiYang in #6356
- [doc] chore: OPD docs by @JacobHelwig in #6358
- [hardware] add DT flops by @brook-cpp in #6363
- [doc] fix: delete sglang_multiturn by @xvlincaigou in #6379
- [trainer,data] fix: Support merging
extra_infoformain_ppo_sync& update TQ dependency by @0oshowero0 in #6354 - [trainer] feat: add set_expandable_segments support for npu by @ji-huazhong in #6346
- [trainer] feat: deprecate main_ppo.py warning by @wuxibin89 in #6384
- [trainer] feat: async generation dump with exception propagation and streaming write by @Jackie2049 in #6324
- [doc] chore: announce VeRL-Omni pre-release in README News by @SamitHuang in #6390
- [doc] refactor: update rocm doc by @mingjielu in #6388
- [fsdp] fix: emit distillation outputs in use_remove_padding=False path by @abinggo in #6386
- [ci] fix: use TRTLLM_TEST_MODEL_PATH_ROOT in test_trtllm_rollout_utils by @Superjomn in #6385
- [misc] fix: use directory-symlink layout for shared skills by @tongyx361 in #6391
- [veomni] feat: Add GRPO training scripts for Qwen3-VL-30B-MOE (VeOmni Backends) by @phdddd in #5275
- [veomni] feat: add veomni qwen3-30b and fix ep by @wangshuyang31 in #6323
- [doc] chore: Update Ascend Docker build guidance by @anzhsoft in #6399
- [megatron, cfg] feat: add Qwen3.5-35B Megatron-Bridge launch script on Ascend by @Zhang1Sheng in #6318
- [perf, hardware] feat: NPU supports Liger-Kernel by @zheliuyu in #6244
- [fsdp] feat: Support zero2 optional feature for FSDP1 in engine worker. by @ZLiao097 in #6410
- Update installation instruction for TransferQueue by @emmericp in #6420
- [tool] feat: add Gemma4 tool parser with stop token and response formatting by @nanastassacos in #6406
- [fsdp] fix: sort buffers in fsdp2_load_full_state_dict to prevent NCCL deadlock with heterogeneous buffers by @nanastassacos in #6405
- [megatron] chore: refactor to use Megatron-Bridge new APIs by @HollowMan6 in #6335
- [doc] chore: Update ascend doc link in README.md by @hustmf in #6427
- [megatron] fix: set use_mbridge to True for some npu scripts by @zjchenn in #6429
- [doc] fix: fix ascend dockerfile_build_guidance as issue by @yyyy2000 in #6400
- [misc] fix: device variable not bound in some scripts by @zjchenn in #6430
- [megatron, cfg] feat: add Qwen3-VL-30B mbridge launch script on Ascend by @Seren-hao in #6443
- [docker] chore: update vllm 0.20.2 image by @ETOgaosion in #6393
- [sglang, one_step_off] fix: add free_cache_engine guard to resume_kv_cache by @dafu-wu in #6442
- [ckpt, model] fix: save LoRA train metadata for PPO actor checkpoint by @Yatogaii in #6409
- [fully_async] fix: initialize _dump_executor in FullyAsyncTrainer and FullyAsyncRollouter by @nanastassacos in #6438
- [ci] fix: the qwen3 model replaces the qwen25 model by @daikang6 in #6398
- [rollout] fix: fix fp8 for async RL and multinode rollout by @hchings in #6344
- [veomni] feat: add VeOmni-native critic support by @Luosuu in #6453
- [fully_async,doc] feat: rm future plans, almost all completed. by @ArronHZG in #6457
- [rollout] feat: add Trackio rollout trace logging by @abidlabs in #6423
- [model] refactor: clean up outdated Qwen2_5_vl code implementation by @ji-huazhong in #6445
- [megatron,rollout] fix: align MTP loss and rollout metrics by @xhx1022 in #6432
- [docker] fix: align stable vllm ARM version with x86 by @kaixih in #6460
- [ci, trainer] fix: fix code, scripts and st for mindspeedllm backend by @pengnuoheng in #6316
- [fsdp] fix: device mismatch between fsdp2 offload and weights transfer by @ETOgaosion in #6463
- [ci] chore: update npu docker to cann 9.0.0 by @yyyy2000 in #6466
- [veomni] feat: wire MoE load-balance monitor into VeOmni engine by @Luosuu in #6470
- [fully_async] feat: fully async profiling by @tardis-key in #6461
- [fsdp, megatron, trainer] feat: add top-k distillation overlap metrics by @Turingzero0 in #6469
- [trainer, rollout, cfg] feat: add extension points for custom worker configs by @Luosuu in #6489
- [veomni] fix: VeOmniEngineWithValueHead loads ForTokenClassification by @Luosuu in #6488
- [trainer] fix: gracefully shutdown trainer with TransferQueue by @wuxibin89 in #6491
- [veomni] feat: add MoE router replay (R2/R3) support by @hjshi84 in #6325
- [fsdp, model] feat: support qwen3_5 ulysses sp by @SaltFish11 in #6482
- [ci] chore: requirements-npu add triton-ascend by @wucong25 in #6493
- [rollout, vllm] fix: use engine.sleep() instead of collective_rpc by @dafu-wu in #6456
- [ci] chore: triton-ascend==3.2.1 need install path in docker by @yyyy2000 in #6498
- [docker] chore: upgrade sglang to 0.5.12 by @ETOgaosion in #6435
- [rollout, vllm] fix: treat null rollout seed as 0 for engine init by @SamitHuang in #6503
- [fsdp] fix: add sp and use_remove_padding validation for SFT and RL in fsdp engine by @fisherxu in #6502
- [rollout] feat: enable MooncakeStoreConnector with hard-reset on weight update by @aoshen02 in #6373
- [megatron] feat: ascend bump into megatron 016 by @wangshuyang31 in #6374
- [megatron, trainer] fix: preserve BSHD top-k distillation shape by @anzhsoft in #6506
- [hardware] chore: remove redundant Dockerfile.rocm7 by @Vivicai1005 in #6514
- [cfg] fix: align dataclass defaults with yaml by @anzhsoft in #6494
- [doc] feat: add uni-agent release to readme by @yyDing1 in #6516
- [ci] fix: Update Dockerfile.ascend_9.0.0_a3 by @yyyy2000 in #6517
- [ci] fix: remove the uninstall of triton in ascend docker by @yyyy2000 in #6524
- [ci] chore: add npu sglang nightly ci by @hustmf in #6521
- [veomni, fsdp] feat: enable fused top-K distillation kernel for OPD by @Luosuu in #6511
- [doc] chore: point AMD ROCm section at amd_quick_start.rst by @Vivicai1005 in #6518
- [ckpt] feat: pass global steps to checkpoint engines by @athreesh in #6507
- [doc] refactor: optimize ascend doc by @hustmf in #6532
- [doc] chore: fix verl ascend readme by @wucong25 in #6534
- [ci] chore: npu ci use cann9.0.0 by @daikang6 in #6520
- [veomni] fix: VeOmniEngineWithValueHead token-cls lookup default value for transformers v5 by @Luosuu in #6540
- [model] fix: support trl>=0.29 AutoModelForCausalLMWithValueHead import by @wenzhaoabc in #6539
- [ci] fix: FSDP actor/critic ci fail by @wuxibin89 in #6550
New Contributors
- @Gary-cjy made their first contribution in #5215
- @AnikiFan made their first contribution in #5550
- @Fan-Yunfan made their first contribution in #5464
- @HuiyingLi made their first contribution in #5407
- @knlnguyen1802 made their first contribution in #5616
- @Luosuu made their first contribution in #5662
- @koanho made their first contribution in #5705
- @Solus-sano made their first contribution in #5689
- @yyZhangAI made their first contribution in #5725
- @stas00 made their first contribution in #5735
- @FrankHo-Hwc made their first contribution in #5742
- @chenyingshu made their first contribution in #5713
- @JenniferWang made their first contribution in #5591
- @zhijie-os made their first contribution in #5756
- @zyang6 made their first contribution in #5556
- @beirong8kmiles made their first contribution in #5734
- @0lynnlin0 made their first contribution in #5791
- @Zhang1Sheng made their first contribution in #5682
- @AndyZhou952 made their first contribution in #5716
- @SanftMonster made their first contribution in #5823
- @farazkh80 made their first contribution in #5635
- @Tjh-UKN made their first contribution in #5186
- @Jackie2049 made their first contribution in #5860
- @zhtmike made their first contribution in #5802
- @yifannnwu made their first contribution in #5885
- @NoonePauseferg made their first contribution in #5884
- @reonokiy made their first contribution in #5881
- @pengnuoheng made their first contribution in #5680
- @shikicloud made their first contribution in #5856
- @MaxwellJryao made their first contribution in #5839
- @NaomiEisen made their first contribution in #5718
- @yangspirit made their first contribution in #5911
- @fh188 made their first contribution in #5913
- @xuy1234 made their first contribution in #5923
- @kaixih made their first contribution in #5596
- @Zhikaiiii made their first contribution in #5977
- @Stonesjtu made their first contribution in #5971
- @ruanhao566 made their first contribution in #5991
- @deerlu made their first contribution in #5900
- @pull-ups made their first contribution in #5978
- @ZhentaoFan made their first contribution in #5969
- @yxs made their first contribution in #6052
- @HwCARI made their first contribution in #5967
- @xiaohong42 made their first contribution in #6062
- @nowang6 made their first contribution in #6084
- @evmanz made their first contribution in #5753
- @AkiRusProd made their first contribution in #6109
- @xiefan46 made their first contribution in #6055
- @armorbreak001 made their first contribution in #6089
- @startju made their first contribution in #6153
- @nev8rz made their first contribution in #6167
- @Pengxiang-Li made their first contribution in #6192
- @SamitHuang made their first contribution in #6200
- @shivam2199 made their first contribution in #6150
- @acmore made their first contribution in #6193
- @shaanjiangcun made their first contribution in #6216
- @timothygao8710 made their first contribution in #6246
- @LeiDing191 made their first contribution in #6217
- @sunrainyg made their first contribution in #6257
- @boundless-future made their first contribution in #6251
- @GJWu-zyx made their first contribution in #6287
- @MohammadShahdad made their first contribution in #6305
- @memset0 made their first contribution in #6317
- @ZhuYajun-AI made their first contribution in #5901
- @SteadfastAsArt made their first contribution in #6290
- @Mengyuyang made their first contribution in #6337
- @zhouhengan1211 made their first contribution in #6347
- @anzhsoft made their first contribution in #6345
- @abinggo made their first contribution in #6350
- @haoyang9804 made their first contribution in #6313
- @brook-cpp made their first contribution in #6363
- @mingjielu made their first contribution in #6388
- @phdddd made their first contribution in #5275
- @emmericp made their first contribution in #6420
- @nanastassacos made their first contribution in #6406
- @dafu-wu made their first contribution in #6442
- @abidlabs made their first contribution in #6423
- @Turingzero0 made their first contribution in #6469
- @SaltFish11 made their first contribution in #6482
- @fisherxu made their first contribution in #6502
- @aoshen02 made their first contribution in #6373
- @Vivicai1005 made their first contribution in #6514
- @athreesh made their first contribution in #6507
- @wenzhaoabc made their first contribution in #6539
Full Changelog: v0.7.1...v0.8.0