github verl-project/verl v0.8.0

2 hours ago

Highlights

Training

Megatron

  • Megatron-FSDP mode for the Megatron backend (#5423).
  • Dynamic context parallel (#5057) and CP for BSHD format (#5826).
  • Checkpoint save as HF PEFT format (#5575).
  • Qwen3.5 MTP SFT/RL (#5898)
  • NVFP4 (W4A16) QAT training via ModelOpt (#5254) with QAT documentation (#5861).

VeOmni

  • VeOmni engine for on-policy distillation (#6072) with native critic support (#6453) and native return_log_probs path (#6184).
  • MoE router replay (R2/R3) support (#6325) and MoE load-balance monitoring (#6470).
  • Qwen3.5 SP, GRPO demos for Qwen3-VL-30B-MOE / Qwen3.5-122b-a10b / Qwen3.5-30b (#6061, #5275, #6264, #6323).

Hardware / NPU

  • NPU support for Liger-Kernel (#6244).
  • Expandable segments (#5795, #6346).
  • MXFP8 rollout on Ascend 950 (#5756).

Rollout

vLLM

  • Split large weights into chunks in NCCL/NIXL checkpoint engine (#6091).
  • MooncakeStoreConnector with hard-reset on weight update (#6373).
  • Upgrade stable version to vllm==0.20.2 (#6393).

SGLang

  • LoRA support for SGLang rollouts (merge + native adapter paths) (#5564).
  • SGLang Prefill-Decode disaggregated rollout (#6117).
  • Upgrade stable version to sglang==0.5.12 (#6435).

TensorRT-LLM

  • Async RL for TRT-LLM rollout, including inter-node support (#5631, #5992).

On-Policy Distillation (OPD)

Trainer

Sync Trainer

  • New sync trainer with TransferQueue to decouple control flow and data flow in the single controller (#5401).
  • Up to 2x speedup in multimodal training with 128 GPUs, see TransferQueue blog post.
  • Support multiple trajectories output for each agent loop.
  • TBD: Fully async trainer with TransferQueue will be in next release.

Fully Async Trainer

  • Reuse trainer worker group for hybrid rollout during validation (#6076).
  • Online policy distillation in fully async training (#6056).
  • Qwen3-VL-30B-A3B and Qwen3-VL-8B fully async GRPO training scripts (#6131, #6006).

Agentic RL Training

  • New project verl-project/uni-agent for building, running, and training general agents at scale.
  • Support coding, search, GUI agent RL training.

Tools & Reward

  • Simpler function-based tool registration (#6189) with per-sample tool environment routing (#5978) and Gemma4 tool parser (#6406).
  • Multi-step support in skip_rollout v2 (#5556) and improved error messages for malformed tool calls (#6055).

Breaking Changes

  • Deprecate legacy FSDP and Megatron workers, migrate to the unified engine abstraction (#5604, #6067).
  • Deprecate verl/interactions (#6074).
  • Move LLMServerManager out of AgentLoopManager (#6129).
  • Migrate Diffusion RL stack to verl-omni (#6200) and experimental/vla to a standalone verl-vla repository (#6162).
  • Remove curriculum sampler + dynamic dataset + tool examples (#6302).
  • main_ppo.py is deprecated with a warning in favor of main_ppo_sync.py (#6384).

What's Changed

  • [ci] feat: add profiling tests to vLLM ci by @Gary-cjy in #5215
  • [config] fix: sync chat_template from tokenizer to processor for multimodal base models (e.g. Qwen3.5) by @khazic in #5612
  • [rollout] fix: DP > 1 hang with vllm rollout when training dense or MoE(EP=1) model by @NascentAscension in #5609
  • [doc] refactor: rearrange ascend doc by @hustmf in #5620
  • [Megatron] feat: Support compatibility enhancements of vp_stage by @ChibiQuest in #5580
  • [model] chore: Fix Qwen3-235B precision issues on NPU by @autbuster in #5610
  • [data, doc, misc] fix: fix outdated config keys in example scripts and docs by @AnikiFan in #5550
  • [ci] chore: add npu one step off megatron ci and fix fsdp ci by @wucong25 in #5510
  • [single_controller] fix: correct spelling error 'procecss' -> 'process' by @Fan-Yunfan in #5464
  • [ci] fix: full async npu ci test by @ETOgaosion in #5647
  • [doc] fix: modify ascend_quick_start for submodule recipe by @xiazhahe in #5642
  • [algo, trainer] fix: pass missing old_log_probs to OTB estimators by @dubin555 in #5615
  • [ci] fix: mcore deepseek CI test by @ETOgaosion in #5658
  • [algo] feat: SAC Performance Improvements for Pi0.5 by @Miical in #5645
  • [data] feat: optimized text filter process speed on transformer>=5.3.0 and run qw3.5 + aime data by @chenjiaoAngel in #5632
  • [vllm] chore: fix mc2 used in vllm_ascend on A2 npu by @wucong25 in #5560
  • [vllm] fix: npu disable flash_attn for RotaryEmbedding by @Mind-s in #5640
  • [trainer] feat: Add Nemo-Automodel as alternative training engine by @HuiyingLi in #5407
  • [fully_async] fix: fully_async ckpt save bug by @sl-1314 in #5677
  • [megatron] fix: add megatron checkpoint patch by @Begunner in #5251
  • [sglang,fsdp] feat: LoRA support for SGLang rollouts (merge + native adapter paths) by @cavities12 in #5564
  • [1/n][vllm, rollout] feat: flowgrpo - support vllm-omni as rollout backend for verl by @knlnguyen1802 in #5616
  • [training_utils] feat: use flash_attn cross_entropy loss in FusedLinearForPPO by @Luosuu in #5662
  • [rollout,trtllm] fix: trtllm multinode rollout by @hchings in #5693
  • [fsdp,megatron,vllm,trainer,algo] feat: On-Policy Distillation by @JacobHelwig in #5041
  • [trainer] fix: fixed an issue where mindspeed's backend context parallelism feature functioned incorrectly by @ji-huazhong in #5697
  • [megatron] fix: remove llama and qwen2 files by @ChengQianqian in #5707
  • [trtllm,rollout] fix hang issue from VLM codepath by @hchings in #5701
  • [vllm] fix: fp8 utils with vllm15 for moe model by @sophiayyya in #5661
  • [one_step_off] fix: fix one-step-off update weights before rollout finished by @wucong25 in #5698
  • [megatron, ckpt] fix: set dist_ckpt_optim_fully_reshardable default to False by @koanho in #5705
  • [fsdp, perf, doc] fix: fix Liger integration for VL models and RL training, allowing liger speed improvement by @EricMarcus-ai in #5669
  • [perf, trainer, training_utils, ray, worker] fix: Add set_numa_affinity() for engine workers: TrainingWorker. by @sheilaliuxl in #5627
  • [megatron, vllm] feat: NVFP4 (W4A16) QAT training support via ModelOpt by @jQizhang in #5254
  • [training_utils] fix: use response_lens.max() instead of offsets().max() for nested tensor max_response_len by @dubin555 in #5699
  • [fsdp] fix: avoid NestedTensor jagged dim ambiguity for 3D position_ids by @Solus-sano in #5689
  • [ci] fix: fix various ci failure by @wuxibin89 in #5717
  • [rollout] fix: enable FP8 quantization for SGLang rollout in fully async mode. by @eternally-z in #5675
  • [vllm] feat: Add support for the Qwen3_5MoeForCausalLM model On Ascend by @mikequan0425 in #5652
  • [trainer] fix: skip dataloader state restore when resuming at epoch boundary by @yyZhangAI in #5725
  • [algo] feat: Implement IcePop in rollout correction by @HollowMan6 in #5722
  • [fully_async] chore: Add fully async dapo qwen3-30b npu script by @wangshuyang31 in #5653
  • [model] fix: An end-to-end script for the 235b model is provided for the 256k long sequence by @autbuster in #5733
  • [model] chore: Corrected the description of errors related to the 235b script and fixed the error in running the sft script. by @autbuster in #5732
  • [misc] fix: make the assert user-friendly for get_tensordict by @stas00 in #5735
  • [ci] fix: fix circular import in ci by @vermouth1992 in #5736
  • [1/2][rollout,trainer] refactor: Teacher colocate mode -- Move teacher logprob computation to AsyncTeacherLLMServerManager by @JacobHelwig in #5723
  • [misc] fix: supplement the dependencies that are missing in the requirements-npu.txt by @nuerxiati in #5740
  • [ckpt] fix: handle string task_type in LoRA model merger by @FrankHo-Hwc in #5742
  • [ci] chore: delete install current repository for npu ci by @yyyy2000 in #5748
  • [BREAKING][trainer] feat: deprecate legacy engine fsdp and megatron workers by @wuxibin89 in #5604
  • [2/2][rollout,trainer] feat: Teacher colocate mode by @JacobHelwig in #5745
  • [trtllm, rollout] fix: partial loading logic by @hchings in #5728
  • [trainer] fix: convert numpy types to native Python types in MultiTurnSFTDataset by @khazic in #5743
  • [3/n][reward] feat: flowgrpo - support image-based rewards (rule-based & genrm) by @chenyingshu in #5713
  • [ci] chore: delete mirror for npu ci by @yyyy2000 in #5758
  • [fsdp] fix: pass dp_group to prepare_dynamic_batch to fix CUDA deadlock by @JenniferWang in #5591
  • [megatron] feat: checkpoint save as HF PEFT format by @HollowMan6 in #5575
  • [doc] refactor: add constraints on the use of vpp and mbridge parameters by @zjchenn in #5763
  • [fully_async] fix: Patch vllm013 weight loader for qwen3-moe series by @wangshuyang31 in #5695
  • [hardware, rollout] feat: enable MXFP8 rollout on Ascend 950 devices (DV100 & DV120) by @zhijie-os in #5756
  • [rollout, tool] feat: support multi-step in skip_rollout v2 by @zyang6 in #5556
  • [trainer] fix: MLFlow publishing metrics failure should be non-blocking. by @sheilaliuxl in #5771
  • [ci] chore: add npu nightly ci for dapo-moonlight-16b-megatron and modify log path by @beirong8kmiles in #5734
  • [trainer] feat: support use_remove_padding=False for mindspeed backend by @ji-huazhong in #5768
  • [ci] fix: resolve oom when allocating weight transfer buffer in fully async test cases by @0lynnlin0 in #5791
  • [fsdp, model] feat: add qwen3.5 fsdp grpo training support. by @Zhang1Sheng in #5682
  • [2/n][rollout] feat: flowgrpo - add diffusion agent loop support by @AndyZhou952 in #5716
  • [trainer] feat: enable expandable segment support for npu by @ji-huazhong in #5795
  • [ci] feat: support Ascend A2/A3 docker image build pipeline for sglang by @xiazhahe in #5804
  • [tool] chore: remove hard-code tool agent loop in fully async by @yyDing1 in #5816
  • [megatron] feat: support dynamic CP by @ISEEKYAN in #5057
  • [sglang, rollout] fix: wire up LoRA adapter path for engine_workers + sglang sleep by @cavities12 in #5769
  • [env] fix: Modify the package installation sequence in the Ascend installation guide by @nuerxiati in #5819
  • [rollout] fix: processor does not have image_processor. by @SanftMonster in #5823
  • [single_controller] fix: Set device_name for split_resource_pool to prevent failure on NPU environments by @0oshowero0 in #5824
  • [megatron] fix: pass use_distributed_optimizer to ddp_config in vanilla mbridge path by @khazic in #5775
  • [reward] fix: disable signal.alarm() in math_verify to fix silent scoring failure in Ray workers by @farazkh80 in #5635
  • [megatron] feat: support cp for bshd format by @wuxibin89 in #5826
  • [megatron, fsdp] feat: DP workload balance for SFT by @arvyanh in #5679
  • [doc] chore: add news for PyTorch Conference Europe 2026 by @HollowMan6 in #5847
  • [doc] chore: update README.md by @wuxibin89 in #5850
  • [misc] feat: add agent instructions, skills & improve CI for easier tests by @tongyx361 in #5846
  • [rollout, trtllm] fix: add missing init.py to trtllm_rollout package by @Superjomn in #5857
  • [misc] fix: license for verl/workers/rollout/trtllm_rollout/init.py by @tongyx361 in #5862
  • [vllm] fix: Fix vLLM synchronization error caused by SGLang skipping resume optimize by @ZLiao097 in #5866
  • [cfg] refactor: unify ppo_trainer and ppo_megatron_trainer config by @wuxibin89 in #5848
  • [fully_async] chore: Update fully async dapo qwen3-30b npu script by @wangshuyang31 in #5864
  • [doc] chore: add npu faq doc by @hustmf in #5871
  • [docker, ci] fix: all CIs, transformers upgrade to 5.3.0 and vllm==0.18.0 by @ETOgaosion in #5724
  • [doc] feat: add NVFP4 QAT documentation by @zhangyimi in #5861
  • [megatron, cfg] feat: add Qwen3.5-122B Megatron launch script by @none0663 in #5874
  • [tool] feat: verl integrate msprobe data collection by @Tjh-UKN in #5186
  • [ci] fix: rename fsdp-vlm to megatron-vlm in trtllm cleanup needs by @Superjomn in #5880
  • [megatron] fix: support critic model by @wuxibin89 in #5870
  • [trainer] fix: handle empty response_mask in calculate_debug_metrics by @Jackie2049 in #5860
  • [4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support by @zhtmike in #5802
  • [ci] fix: fix machine label for nightly_ascend.yml by @yyyy2000 in #5887
  • [cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig by @yifannnwu in #5885
  • [megatron] fix: enable_routing_replay fails with MLATransformerConfig… by @NoonePauseferg in #5884
  • [model] fix: replace inplace += with out-of-place addition in dummy visual forward by @reonokiy in #5881
  • [megatron] fix: ValueError when unpacking preprocess_thd_engine result in router replay by @guillemgt in #5891
  • [trainer] feat: add mindspeedllm backend engine support on NPU. by @pengnuoheng in #5680
  • [ci, trtllm] test: speed up trtllm CI by using smaller models and reducing test parameters by @shikicloud in #5856
  • [reward] fix: restore timeout in math_verify via ProcessPoolExecutor by @MaxwellJryao in #5839
  • [ckpt, trainer] feat: Add plugin hooks for custom CheckpointEngineManager and CheckpointEngine by @NaomiEisen in #5718
  • [doc] chore: Bug fixes for the qwen3-235b model in 256k scenarios by @autbuster in #5908
  • [ckpt] fix: load custom_backend_module in CheckpointEngineManager on driver by @yangspirit in #5911
  • [doc] fix: fix non‑compliant sections by @fh188 in #5913
  • [megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU by @ZLiao097 in #5904
  • [rollout] chore: bump up trtllm image version to 1.3.0rc10 by @Superjomn in #5841
  • [trainer,perf] fix: enable profiler for SFT trainer by @wuxibin89 in #5909
  • [misc] feat: Update file logger path output to absolute path by @vermouth1992 in #5924
  • [training_utils, hardware] refactor: standardize deterministic environment variables for NCCL and NPU by @xuy1234 in #5923
  • [ci] chore: add vllm_ascend.yaml by @Annarine in #5759
  • Revert "[megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU" by @wuxibin89 in #5942
  • [vllm] fix: remove redudant clone in weight refit by @wuxibin89 in #5934
  • [ci] chore: add nightly npu docker for v0.7.1 by @yyyy2000 in #5930
  • [sglang] fix: Adapting the use of _launch_subprocesses to the latest SGLang branch by @xiazhahe in #5868
  • [megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU by @ZLiao097 in #5945
  • [sglang] fix: sglang empty result problem by @Begunner in #5936
  • [docker] feat: Add GB200 (aarch64/Blackwell) Docker image and training example by @kaixih in #5596
  • [trainer] feat: add new trainer with TranferQueue by @wuxibin89 in #5401
  • [megatron] fix: MTP loss deadlock when using context parallelism by @xhx1022 in #5895
  • [ci] fix: indentation error in one step off policy e2e ci by @HollowMan6 in #5960
  • [ci] chore: Update ascend related files code owner by @FightingZhen in #5982
  • [fully_async]fix: terminated training when streaming_generation raise exception by @Zhikaiiii in #5977
  • [rollout] fix: prevent engine_kwargs from overwriting KvCacheConfig in trtllm rollout by @Superjomn in #5939
  • [ci] chore: Add veomni npu ci test by @wangshuyang31 in #5935
  • [doc] chore: add rloo advantage estimator example script for npu by @zjchenn in #5950
  • [trainer] fix: return NaN for empty tensors in compute_data_metrics by @Jackie2049 in #5899
  • [reward] feat: add compute_score timing metrics to agent loop by @Stonesjtu in #5971
  • [fully_async] feat: enable fully async to log_val_generations by @Begunner in #5988
  • [ci, vllm] chore: update vllm-omni 0.18.0 official release and Miscellaneous by @AndyZhou952 in #5809
  • [doc] fix: move low precision doc by @sophiayyya in #5994
  • [rollout, vllm] fix: auto-convert disable_mm_preprocessor_cache to mm_processor_cache_gb for vllm >= 0.13.0 by @Silas-11 in #5961
  • [fsdp] feat: qwen3.5 add npu docker file by @ruanhao566 in #5991
  • [perf] feat: simplify precision_debugger config behavior and docs by @Tjh-UKN in #5986
  • [doc] feat: move msprobe to ascend_tutorial by @tardis-key in #6004
  • [misc, fully_async] feat: add Qwen3-VL-8B fully async GRPO training script on geo3k by @Silas-11 in #6006
  • [megatron] fix: update patch for MLA flashattn forward by @HollowMan6 in #6005
  • [veomni] feat: bump veomni to v0.1.8 by @deerlu in #5900
  • [megatron] fix: add missing FP8 padding for router replay by @eternally-z in #5989
  • [megatron, trainer] fix: respect calculate_entropy config in megatron actor update by @MaxwellJryao in #6016
  • [ci] chore: add sglang new version docker for NPU by @xiazhahe in #6021
  • [data] fix: pad data in preprocess_packed_seqs if shorter than align_size by @beirong8kmiles in #6001
  • [tool, rollout, cfg] feat: per-sample tool environment routing for ToolAgentLoop by @pull-ups in #5978
  • [fsdp] feat: qwen3.5 modify npu docker file based on CANN 8.5.2 by @ruanhao566 in #6017
  • [ci] fix: update docker-build-ascend-a3-qwen3_5 by @yyyy2000 in #6022
  • [fully_async] fix: add fully async grpo qwen3-235b npu script in main branch by @wangshuyang31 in #6012
  • [veomni] feat: add DeepSeek-V3 to MOE_PARAM_HANDERS by @Luosuu in #5996
  • [ci] chore: qwen3.5 add docker file add x86 CANN8.5.2 by @ruanhao566 in #6031
  • [misc] chore: remove deprecated requirements.txt by @wuxibin89 in #6032
  • [data, trainer] fix: batch padding for multi-trajectory by @ZhentaoFan in #5969
  • [fully_async] fix: replace routed_experts on partial rollout resume i… by @NoonePauseferg in #6029
  • [veomni] fix: use local paths for VeOmni model loading by @Luosuu in #6034
  • [trainer,algo] feat: Support On-Policy Distillation in main_ppo_sync by @0oshowero0 in #5997
  • [trainer] fix: add missing rollout dump and corrected validation logging in main_ppo_sync by @guillemgt in #6024
  • [5/n][trainer] feat: flowgrpo trainer by @zhtmike in #5951
  • [trainer, rollout, algo] refactor: Remove OPD colocate mode by @JacobHelwig in #6039
  • [rollout] fix: RM sleep/wake teacher replicas by @JacobHelwig in #6041
  • [fully_async] fix: preserve per-iteration routed_experts on partial rollout resume by @NoonePauseferg in #6046
  • [misc] fix: project name refactor - volcengine -> verl by @ETOgaosion in #6053
  • [veomni] feat: support Qwen3.5 SP and add GRPO trainer demo using VeOmniEngine by @deerlu in #6061
  • [rollout] chore: single turn agent loop also enable rollout trace as tool loop by @pengwu22 in #6048
  • [trainer,cfg,rollout,algo] feat: Multi-Teacher OPD by @JacobHelwig in #6051
  • [fully_async] fix: avoid blocking ray.get inside async actor methods by @yxs in #6052
  • [rollout] feat: add inter-node TRT-LLM rollout support for trtllm by @Superjomn in #5992
  • [fully_async] fix: Add Mindspeed Patch for Async Training on Ascend NPUs by @HwCARI in #5967
  • [fully_async, rollout, trainer, tool, cfg] fix: ROCm async training compatibility for AMD MI300X by @xiaohong42 in #6062
  • [algo] fix: strip '+' suffix in kl_penalty so k3+/low_var_kl+ work by @MaxwellJryao in #6058
  • [recipe, cfg] fix: update NPU script parameters for Qwen3 GSPO and DAPO math recipes by @zjchenn in #6066
  • [fully_async] fix: fix rollouter/idle compute in async-mode by @chenjiaoAngel in #6069
  • [BREAKING] [misc] refactor: deprecate workers, migrate to engines by @ETOgaosion in #6067
  • [BREAKING] [env] refactor: deprecate verl/interactions by @ETOgaosion in #6074
  • [veomni] feat: enable VeOmni engine for on-policy distillation by @hjshi84 in #6072
  • docs: fix typo Hellasage -> HellaSwag by @nowang6 in #6084
  • [fully_async] Fix: fix fully async profiler for first step. by @Shangwei-Li in #6070
  • [fully_async] Fix: Remove _fit_torch_memory for separation trainer. by @Shangwei-Li in #6075
  • [fsdp, perf] fix: skip redundant to(cuda) and gc.collect in train_mode when offload is disabled by @evmanz in #5753
  • [ci] chore: change some npu ci test yml machine by @yyyy2000 in #6043
  • [fully_async] fix: allow drain loop to resume early on parameter sync when partial_rollout is enabled by @Begunner in #6090
  • [fsdp] feat: Qwen3.5 Adds Docker Files Based on CANN8.5.2 A2 by @ruanhao566 in #6098
  • [trainer] fix: include uid and sort by uid in validation generation dumps in main_ppo_sync by @guillemgt in #6101
  • [ci] chore: update npu docker build pipeline by @yyyy2000 in #6102
  • [ci] fix: remove spmd test by @tardis-key in #6103
  • [docker] chore: update npu v0.7.1 docker pipeline by @yyyy2000 in #6106
  • [fully_async] Fix: fix megatron save and offload in case param_offload is on by @Shangwei-Li in #6095
  • [sglang] feat: Patch sglang to support on-policy distillation teacher by @mingruimingrui in #6120
  • [sglang] feat: restrict abort and resume requests to primary server only by @AkiRusProd in #6109
  • [trainer] fix: preserve jagged tensor layout when rebuilding nested tensors with same sequence length by @huaiyizhao in #6127
  • [rollout] feat: improve error messages for malformed tool calls by @xiefan46 in #6055
  • [model] feat: support qwen35 mtp sft/rl by @zpltys in #5898
  • [rollout] fix: truncate routed_experts to response_length to match truncated response_ids by @armorbreak001 in #6089
  • [fully_async, reward] feat: enable GenRM/DisRM support in fully async training by @xiefan46 in #6044
  • [trainer] fix: support non-last ragged dim in nested tensor rebuild by @xiefan46 in #6149
  • [repo] refactor: move experimental/vla to standalone verl-vla repository by @Miical in #6162
  • [doc] fix: update dapo multi model optimization practice by @ChibiQuest in #6161
  • [algo, fsdp, megatron, cfg] fix: wire up sum_pi_squared for optimal_token_baseline by @startju in #6153
  • [fully_async] fix: fix custom reward function register on async mode by @chenjiaoAngel in #6166
  • [trainer] feat: extend CheckpointEngineManager and AgentLoopManager hooks to separation and one-step-off trainers by @NaomiEisen in #6173
  • [vllm] chore: fix sleep_level in Ascend by @wucong25 in #6170
  • [trainer] fix: convert numpy arrays to native types before dumping rollout JSONL by @nev8rz in #6167
  • [doc] fix: update dapo multi model optimization practice by @ChibiQuest in #6169
  • [doc] chore: fix ascend qucik start by @wucong25 in #6174
  • [rollout,vllm] feat: split large weight into chunks in NCCL/NIXL checkpoint engine by @wuxibin89 in #6091
  • [ci] fix: engine_mindspeed_llm_rl_job switch to A3. by @pengnuoheng in #6020
  • [ci] chore: add sgl_ascend cu for NPU by @Annarine in #6085
  • [doc] chore: Add DART-GUI project to README by @Pengxiang-Li in #6192
  • [worker] fix: optimzers doesn't have to be on same device as model by @HollowMan6 in #6196
  • [trainer] fix: remove actor.dump_memory_snapshot by @tardis-key in #6198
  • [worker] fix: grad_norm as non-tensor for metrics by @HollowMan6 in #6195
  • [BREAKING][misc] refactor: migrate Diffusion RL stack to verl-omni by @SamitHuang in #6200
  • [fsdp] fix: honor mixed_precision.param_dtype in forward_step autocast (#5932) by @shivam2199 in #6150
  • [BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager by @wuxibin89 in #6129
  • [ci] chore: bump trtllm to 1.3.0rc13 and verl to v0.7.1 by @Superjomn in #6215
  • [misc] refactor: re-format examples and deprecate old examples by @ETOgaosion in #6126
  • [megatron] fix: avoid 2x peak host memory on Megatron model offload by @acmore in #6193
  • [misc] fix: add missing init.py files to package directories by @guillemgt in #5209
  • [veomni] feat: use VeOmni's native return_log_probs path to compute log_probs by @Luosuu in #6184
  • [reward] refactor: refactor RM score assembly in reward loop by @zhtmike in #6242
  • [tool] fix: In the memory snapshot collection logic, opening history records is not compatible with NPU by @shaanjiangcun in #6216
  • [megatron] fix: fix seq_len pad len, and adapt to new mtp_loss api (for megatron dev brance) by @zpltys in #6206
  • [ci] chore: remove fastmcp deps for sglang ci by @wuxibin89 in #6249
  • [trainer] fix: dump all outputs in validation in main_ppo_sync by @guillemgt in #6227
  • [rollout,vllm] fix: include Ray job id in colocated weight-transfer IPC path by @timothygao8710 in #6246
  • [trainer] fix: update TorchTitanEngine for latest torchtitan API by @acisseJZhong in #6231
  • [reward] fix: compute correct rollout world size by @guillemgt in #6226
  • [fully_async, rollout] feat: enable online policy distillation in fully async training by @xiefan46 in #6056
  • [sglang] feat: SGLang Prefill-Decode disaggregated rollout by @yxs in #6117
  • [rollout] fix: guard sglang profiling when self.tokenizer_manager is None by @LeiDing191 in #6217
  • [tool] feat: Memory snapshot collection, add functionality to clear history after collection. by @shaanjiangcun in #6248
  • [doc] add RandOpt to readme awesome work by @sunrainyg in #6257
  • [rollout] fix: trtllm rollout docker image and a few scripts by @hchings in #6230
  • Revert "[reward] fix: compute correct rollout world size" by @wuxibin89 in #6258
  • [doc] feat: add code reviewer by @ArronHZG in #6260
  • [trainer] fix: write request_id to reward_extra_infos_to_dump instead of reward_extra_infos_dict by @boundless-future in #6251
  • [ci] fix: remove the rebundant config by @yyyy2000 in #6253
  • [rollout] feat: enable Async RL for trtllm rollout by @hchings in #5631
  • [ci] chore: bump trtllm to 1.3.0rc14 and pin mbridge by @Superjomn in #6262
  • [veomni] feat: add Qwen3.5-122b-a10b GRPO trainer demo with EP enabled using VeOmniEngine by @deerlu in #6264
  • [docker] feat: bump aarch64 vllm 0.17->0.18 by @kaixih in #6222
  • [tool] feat: simpler function-based tool registration by @Begunner in #6189
  • [doc] fix: router_replay is now under megatron by @HollowMan6 in #6272
  • [reward, cfg] fix: correctly use RewardModelConfig in reward config by @guillemgt in #6265
  • [ci] chore: slim down TRT-LLM CI by @Superjomn in #6275
  • [doc] fix: correct module path in fully_async_policy documentation by @GJWu-zyx in #6287
  • [megatron] fix: fix bugs when using position_ids in cp by @Kite0011 in #6267
  • [ci] feat: add gspo qwen3-30b in nightly npu ci by @yyyy2000 in #6273
  • [rollout] fix: skip_tokenizer_init=True for OPD teacher by @wuxibin89 in #6296
  • [tool] refactor: tools will be initialized in AgentLoopWorker by @Begunner in #6300
  • [reward, trainer] feat: support multi-output trajectories in async reward scoring by @guillemgt in #6228
  • [BREAKING][tool, data] refactor: remove curriculum sampler + dynamic dataset + tool examples by @Begunner in #6302
  • [model] chore: refactor npu scripts by @wucong25 in #6285
  • [doc] chore: update vllm and vllm ascend from 0.13.0 to 0.18.0 in docs and dockerfile by @wangshuyang31 in #6291
  • [megatron] fix: the NPU error that occurs after migrating from megatron worker to engine worker. by @xiazhahe in #6135
  • [trainer] fix: combine REMAX sampled and greedy samples in one rollout request by @liziniu in #6308
  • [vllm] refactor: MXFP8 support for ascend NPU by @quancs in #6307
  • [megatron] feat: support Megatron-FSDP mode for Megatron backend by @conver334 in #5423
  • [ci] chore: bump trtllm CI image to 1.3.0rc14 by @Superjomn in #6269
  • [fully_async] feat: reuse trainer worker group for hybrid rollout to do validation by @ArronHZG in #6076
  • [misc] refactor: re-format npu examples by @beirong8kmiles in #6286
  • [ci] chore: add sglang ci for NPU by @xiazhahe in #6015
  • [worker] feat: support log memory in engine worker by @yyyy2000 in #6270
  • [doc] refactor: ascend doc refactor of precision guide and dockerfile build guidance by @yyyy2000 in #6298
  • [data] fix: forward apply_chat_template_kwargs to system prompt measurement by @MohammadShahdad in #6305
  • [fsdp] fix: FSDP2 silently drops fsdp_config.forward_prefetch by @memset0 in #6317
  • [megatron] fix: fix bug with mcore0.12.1 + torch2.9.0 by @yyyy2000 in #6322
  • [data, rollout] feat: add audio data support by @SanftMonster in #6276
  • [doc] refactor: Ascend docs rectification, add parameter and metrics descriptions by @nuerxiati in #6294
  • [doc] refactor: Ascend docs rectification, add new FAQ questions by @nuerxiati in #6328
  • [model, fsdp] fix: Fix modeling_qwen2_5_vl missing attribute 'Qwen2RMSNorm' by @ZhuYajun-AI in #5901
  • [fsdp, fully_async] feat: add Qwen3-VL-30B-A3B fully async GRPO training script on geo3k by @zhihaofang1017 in #6131
  • [fsdp, fully_async] fix: fix CI import fast_pos_embed_interpolate in Qwen3-VL by @zhihaofang1017 in #6332
  • [docker,vllm] feat: Enable DeepEP in ARM stable image by @kaixih in #6326
  • [misc] chore: Update for vexact release by @pengwu22 in #6336
  • [model, fsdp] fix: honor SP-rolled labels in fused kernels (#6068) by @shivam2199 in #6268
  • [fsdp, ckpt] fix: drop tied target keys before HF save_pretrained by @ChangyiYang in #6334
  • [fsdp] fix: lenient resolution of _no_split_modules in get_fsdp_wrap_policy by @SteadfastAsArt in #6290
  • [doc] chore: split install guidance and quickstart by @Mengyuyang in #6337
  • [trainer] feat: support ReMax in synchronous TransferQueue trainer by @liziniu in #6340
  • [doc] chore: add npu advanced features by @wucong25 in #6339
  • [doc] refactor: added the document that collects statistics on models and algorithms that support the NPU by @zhouhengan1211 in #6347
  • [ci] fix: change model_path to local cache dir by @wuxibin89 in #6351
  • [fsdp] fix: build no-padding attention mask from input ids by @anzhsoft in #6345
  • [fsdp] fix: emit distillation outputs in use_remove_padding=False path (#6293) by @abinggo in #6350
  • [tool] fix: tool response truncate side by @haoyang9804 in #6313
  • [algo] fix: vectorized grpo low-variance scaling by @haoyang9804 in #6348
  • [doc] chore: verl Ascend doc refactor by @hustmf in #6353
  • [doc] chore: NPU model migration guidance by @Mind-s in #6330
  • [doc] chore: fix ascend doc link by @hustmf in #6359
  • Revert "[fsdp] fix: emit distillation outputs in use_remove_padding=False path (#6293)" by @wuxibin89 in #6360
  • [fsdp, ckpt] chore: fold drop_tied_target_keys into top-level import by @ChangyiYang in #6356
  • [doc] chore: OPD docs by @JacobHelwig in #6358
  • [hardware] add DT flops by @brook-cpp in #6363
  • [doc] fix: delete sglang_multiturn by @xvlincaigou in #6379
  • [trainer,data] fix: Support merging extra_info for main_ppo_sync & update TQ dependency by @0oshowero0 in #6354
  • [trainer] feat: add set_expandable_segments support for npu by @ji-huazhong in #6346
  • [trainer] feat: deprecate main_ppo.py warning by @wuxibin89 in #6384
  • [trainer] feat: async generation dump with exception propagation and streaming write by @Jackie2049 in #6324
  • [doc] chore: announce VeRL-Omni pre-release in README News by @SamitHuang in #6390
  • [doc] refactor: update rocm doc by @mingjielu in #6388
  • [fsdp] fix: emit distillation outputs in use_remove_padding=False path by @abinggo in #6386
  • [ci] fix: use TRTLLM_TEST_MODEL_PATH_ROOT in test_trtllm_rollout_utils by @Superjomn in #6385
  • [misc] fix: use directory-symlink layout for shared skills by @tongyx361 in #6391
  • [veomni] feat: Add GRPO training scripts for Qwen3-VL-30B-MOE (VeOmni Backends) by @phdddd in #5275
  • [veomni] feat: add veomni qwen3-30b and fix ep by @wangshuyang31 in #6323
  • [doc] chore: Update Ascend Docker build guidance by @anzhsoft in #6399
  • [megatron, cfg] feat: add Qwen3.5-35B Megatron-Bridge launch script on Ascend by @Zhang1Sheng in #6318
  • [perf, hardware] feat: NPU supports Liger-Kernel by @zheliuyu in #6244
  • [fsdp] feat: Support zero2 optional feature for FSDP1 in engine worker. by @ZLiao097 in #6410
  • Update installation instruction for TransferQueue by @emmericp in #6420
  • [tool] feat: add Gemma4 tool parser with stop token and response formatting by @nanastassacos in #6406
  • [fsdp] fix: sort buffers in fsdp2_load_full_state_dict to prevent NCCL deadlock with heterogeneous buffers by @nanastassacos in #6405
  • [megatron] chore: refactor to use Megatron-Bridge new APIs by @HollowMan6 in #6335
  • [doc] chore: Update ascend doc link in README.md by @hustmf in #6427
  • [megatron] fix: set use_mbridge to True for some npu scripts by @zjchenn in #6429
  • [doc] fix: fix ascend dockerfile_build_guidance as issue by @yyyy2000 in #6400
  • [misc] fix: device variable not bound in some scripts by @zjchenn in #6430
  • [megatron, cfg] feat: add Qwen3-VL-30B mbridge launch script on Ascend by @Seren-hao in #6443
  • [docker] chore: update vllm 0.20.2 image by @ETOgaosion in #6393
  • [sglang, one_step_off] fix: add free_cache_engine guard to resume_kv_cache by @dafu-wu in #6442
  • [ckpt, model] fix: save LoRA train metadata for PPO actor checkpoint by @Yatogaii in #6409
  • [fully_async] fix: initialize _dump_executor in FullyAsyncTrainer and FullyAsyncRollouter by @nanastassacos in #6438
  • [ci] fix: the qwen3 model replaces the qwen25 model by @daikang6 in #6398
  • [rollout] fix: fix fp8 for async RL and multinode rollout by @hchings in #6344
  • [veomni] feat: add VeOmni-native critic support by @Luosuu in #6453
  • [fully_async,doc] feat: rm future plans, almost all completed. by @ArronHZG in #6457
  • [rollout] feat: add Trackio rollout trace logging by @abidlabs in #6423
  • [model] refactor: clean up outdated Qwen2_5_vl code implementation by @ji-huazhong in #6445
  • [megatron,rollout] fix: align MTP loss and rollout metrics by @xhx1022 in #6432
  • [docker] fix: align stable vllm ARM version with x86 by @kaixih in #6460
  • [ci, trainer] fix: fix code, scripts and st for mindspeedllm backend by @pengnuoheng in #6316
  • [fsdp] fix: device mismatch between fsdp2 offload and weights transfer by @ETOgaosion in #6463
  • [ci] chore: update npu docker to cann 9.0.0 by @yyyy2000 in #6466
  • [veomni] feat: wire MoE load-balance monitor into VeOmni engine by @Luosuu in #6470
  • [fully_async] feat: fully async profiling by @tardis-key in #6461
  • [fsdp, megatron, trainer] feat: add top-k distillation overlap metrics by @Turingzero0 in #6469
  • [trainer, rollout, cfg] feat: add extension points for custom worker configs by @Luosuu in #6489
  • [veomni] fix: VeOmniEngineWithValueHead loads ForTokenClassification by @Luosuu in #6488
  • [trainer] fix: gracefully shutdown trainer with TransferQueue by @wuxibin89 in #6491
  • [veomni] feat: add MoE router replay (R2/R3) support by @hjshi84 in #6325
  • [fsdp, model] feat: support qwen3_5 ulysses sp by @SaltFish11 in #6482
  • [ci] chore: requirements-npu add triton-ascend by @wucong25 in #6493
  • [rollout, vllm] fix: use engine.sleep() instead of collective_rpc by @dafu-wu in #6456
  • [ci] chore: triton-ascend==3.2.1 need install path in docker by @yyyy2000 in #6498
  • [docker] chore: upgrade sglang to 0.5.12 by @ETOgaosion in #6435
  • [rollout, vllm] fix: treat null rollout seed as 0 for engine init by @SamitHuang in #6503
  • [fsdp] fix: add sp and use_remove_padding validation for SFT and RL in fsdp engine by @fisherxu in #6502
  • [rollout] feat: enable MooncakeStoreConnector with hard-reset on weight update by @aoshen02 in #6373
  • [megatron] feat: ascend bump into megatron 016 by @wangshuyang31 in #6374
  • [megatron, trainer] fix: preserve BSHD top-k distillation shape by @anzhsoft in #6506
  • [hardware] chore: remove redundant Dockerfile.rocm7 by @Vivicai1005 in #6514
  • [cfg] fix: align dataclass defaults with yaml by @anzhsoft in #6494
  • [doc] feat: add uni-agent release to readme by @yyDing1 in #6516
  • [ci] fix: Update Dockerfile.ascend_9.0.0_a3 by @yyyy2000 in #6517
  • [ci] fix: remove the uninstall of triton in ascend docker by @yyyy2000 in #6524
  • [ci] chore: add npu sglang nightly ci by @hustmf in #6521
  • [veomni, fsdp] feat: enable fused top-K distillation kernel for OPD by @Luosuu in #6511
  • [doc] chore: point AMD ROCm section at amd_quick_start.rst by @Vivicai1005 in #6518
  • [ckpt] feat: pass global steps to checkpoint engines by @athreesh in #6507
  • [doc] refactor: optimize ascend doc by @hustmf in #6532
  • [doc] chore: fix verl ascend readme by @wucong25 in #6534
  • [ci] chore: npu ci use cann9.0.0 by @daikang6 in #6520
  • [veomni] fix: VeOmniEngineWithValueHead token-cls lookup default value for transformers v5 by @Luosuu in #6540
  • [model] fix: support trl>=0.29 AutoModelForCausalLMWithValueHead import by @wenzhaoabc in #6539
  • [ci] fix: FSDP actor/critic ci fail by @wuxibin89 in #6550

New Contributors

  • @Gary-cjy made their first contribution in #5215
  • @AnikiFan made their first contribution in #5550
  • @Fan-Yunfan made their first contribution in #5464
  • @HuiyingLi made their first contribution in #5407
  • @knlnguyen1802 made their first contribution in #5616
  • @Luosuu made their first contribution in #5662
  • @koanho made their first contribution in #5705
  • @Solus-sano made their first contribution in #5689
  • @yyZhangAI made their first contribution in #5725
  • @stas00 made their first contribution in #5735
  • @FrankHo-Hwc made their first contribution in #5742
  • @chenyingshu made their first contribution in #5713
  • @JenniferWang made their first contribution in #5591
  • @zhijie-os made their first contribution in #5756
  • @zyang6 made their first contribution in #5556
  • @beirong8kmiles made their first contribution in #5734
  • @0lynnlin0 made their first contribution in #5791
  • @Zhang1Sheng made their first contribution in #5682
  • @AndyZhou952 made their first contribution in #5716
  • @SanftMonster made their first contribution in #5823
  • @farazkh80 made their first contribution in #5635
  • @Tjh-UKN made their first contribution in #5186
  • @Jackie2049 made their first contribution in #5860
  • @zhtmike made their first contribution in #5802
  • @yifannnwu made their first contribution in #5885
  • @NoonePauseferg made their first contribution in #5884
  • @reonokiy made their first contribution in #5881
  • @pengnuoheng made their first contribution in #5680
  • @shikicloud made their first contribution in #5856
  • @MaxwellJryao made their first contribution in #5839
  • @NaomiEisen made their first contribution in #5718
  • @yangspirit made their first contribution in #5911
  • @fh188 made their first contribution in #5913
  • @xuy1234 made their first contribution in #5923
  • @kaixih made their first contribution in #5596
  • @Zhikaiiii made their first contribution in #5977
  • @Stonesjtu made their first contribution in #5971
  • @ruanhao566 made their first contribution in #5991
  • @deerlu made their first contribution in #5900
  • @pull-ups made their first contribution in #5978
  • @ZhentaoFan made their first contribution in #5969
  • @yxs made their first contribution in #6052
  • @HwCARI made their first contribution in #5967
  • @xiaohong42 made their first contribution in #6062
  • @nowang6 made their first contribution in #6084
  • @evmanz made their first contribution in #5753
  • @AkiRusProd made their first contribution in #6109
  • @xiefan46 made their first contribution in #6055
  • @armorbreak001 made their first contribution in #6089
  • @startju made their first contribution in #6153
  • @nev8rz made their first contribution in #6167
  • @Pengxiang-Li made their first contribution in #6192
  • @SamitHuang made their first contribution in #6200
  • @shivam2199 made their first contribution in #6150
  • @acmore made their first contribution in #6193
  • @shaanjiangcun made their first contribution in #6216
  • @timothygao8710 made their first contribution in #6246
  • @LeiDing191 made their first contribution in #6217
  • @sunrainyg made their first contribution in #6257
  • @boundless-future made their first contribution in #6251
  • @GJWu-zyx made their first contribution in #6287
  • @MohammadShahdad made their first contribution in #6305
  • @memset0 made their first contribution in #6317
  • @ZhuYajun-AI made their first contribution in #5901
  • @SteadfastAsArt made their first contribution in #6290
  • @Mengyuyang made their first contribution in #6337
  • @zhouhengan1211 made their first contribution in #6347
  • @anzhsoft made their first contribution in #6345
  • @abinggo made their first contribution in #6350
  • @haoyang9804 made their first contribution in #6313
  • @brook-cpp made their first contribution in #6363
  • @mingjielu made their first contribution in #6388
  • @phdddd made their first contribution in #5275
  • @emmericp made their first contribution in #6420
  • @nanastassacos made their first contribution in #6406
  • @dafu-wu made their first contribution in #6442
  • @abidlabs made their first contribution in #6423
  • @Turingzero0 made their first contribution in #6469
  • @SaltFish11 made their first contribution in #6482
  • @fisherxu made their first contribution in #6502
  • @aoshen02 made their first contribution in #6373
  • @Vivicai1005 made their first contribution in #6514
  • @athreesh made their first contribution in #6507
  • @wenzhaoabc made their first contribution in #6539

Full Changelog: v0.7.1...v0.8.0

Don't miss a new verl release

NewReleases is sending notifications on new releases.