verl-project/verl v0.8.0 on GitHub

Highlights

Training

Megatron

Megatron-FSDP mode for the Megatron backend (#5423).
Dynamic context parallel (#5057) and CP for BSHD format (#5826).
Checkpoint save as HF PEFT format (#5575).
Qwen3.5 MTP SFT/RL (#5898)
NVFP4 (W4A16) QAT training via ModelOpt (#5254) with QAT documentation (#5861).

VeOmni

VeOmni engine for on-policy distillation (#6072) with native critic support (#6453) and native return_log_probs path (#6184).
MoE router replay (R2/R3) support (#6325) and MoE load-balance monitoring (#6470).
Qwen3.5 SP, GRPO demos for Qwen3-VL-30B-MOE / Qwen3.5-122b-a10b / Qwen3.5-30b (#6061, #5275, #6264, #6323).

Hardware / NPU

NPU support for Liger-Kernel (#6244).
Expandable segments (#5795, #6346).
MXFP8 rollout on Ascend 950 (#5756).

Rollout

vLLM

Split large weights into chunks in NCCL/NIXL checkpoint engine (#6091).
MooncakeStoreConnector with hard-reset on weight update (#6373).
Upgrade stable version to vllm==0.20.2 (#6393).

SGLang

LoRA support for SGLang rollouts (merge + native adapter paths) (#5564).
SGLang Prefill-Decode disaggregated rollout (#6117).
Upgrade stable version to sglang==0.5.12 (#6435).

TensorRT-LLM

Async RL for TRT-LLM rollout, including inter-node support (#5631, #5992).

On-Policy Distillation (OPD)

End-to-end on-policy distillation support across FSDP, Megatron and VeOmni backends (#5041).
Text and multimodal, single/multiple teachers (#6051), sync (#5997) and fully-async (#6056) trainers.
Reverse KL loss and forward top-k KL loss, with a fused top-K distillation kernel (#6511) and overlap metrics (#6469).
Docs: https://verl.readthedocs.io/en/latest/algo/opd.html

Trainer

Sync Trainer

New sync trainer with TransferQueue to decouple control flow and data flow in the single controller (#5401).
Up to 2x speedup in multimodal training with 128 GPUs, see TransferQueue blog post.
Support multiple trajectories output for each agent loop.
TBD: Fully async trainer with TransferQueue will be in next release.

Fully Async Trainer

Reuse trainer worker group for hybrid rollout during validation (#6076).
Online policy distillation in fully async training (#6056).
Qwen3-VL-30B-A3B and Qwen3-VL-8B fully async GRPO training scripts (#6131, #6006).

Agentic RL Training

New project verl-project/uni-agent for building, running, and training general agents at scale.
Support coding, search, GUI agent RL training.

Tools & Reward

Simpler function-based tool registration (#6189) with per-sample tool environment routing (#5978) and Gemma4 tool parser (#6406).
Multi-step support in skip_rollout v2 (#5556) and improved error messages for malformed tool calls (#6055).

Breaking Changes

Deprecate legacy FSDP and Megatron workers, migrate to the unified engine abstraction (#5604, #6067).
Deprecate verl/interactions (#6074).
Move LLMServerManager out of AgentLoopManager (#6129).
Migrate Diffusion RL stack to verl-omni (#6200) and experimental/vla to a standalone verl-vla repository (#6162).
Remove curriculum sampler + dynamic dataset + tool examples (#6302).
main_ppo.py is deprecated with a warning in favor of main_ppo_sync.py (#6384).

What's Changed

[ci] feat: add profiling tests to vLLM ci by @Gary-cjy in #5215
[config] fix: sync chat_template from tokenizer to processor for multimodal base models (e.g. Qwen3.5) by @khazic in #5612
[rollout] fix: DP > 1 hang with vllm rollout when training dense or MoE(EP=1) model by @NascentAscension in #5609
[doc] refactor: rearrange ascend doc by @hustmf in #5620
[Megatron] feat: Support compatibility enhancements of vp_stage by @ChibiQuest in #5580
[model] chore: Fix Qwen3-235B precision issues on NPU by @autbuster in #5610
[data, doc, misc] fix: fix outdated config keys in example scripts and docs by @AnikiFan in #5550
[ci] chore: add npu one step off megatron ci and fix fsdp ci by @wucong25 in #5510
[single_controller] fix: correct spelling error 'procecss' -> 'process' by @Fan-Yunfan in #5464
[ci] fix: full async npu ci test by @ETOgaosion in #5647
[doc] fix: modify ascend_quick_start for submodule recipe by @xiazhahe in #5642
[algo, trainer] fix: pass missing old_log_probs to OTB estimators by @dubin555 in #5615
[ci] fix: mcore deepseek CI test by @ETOgaosion in #5658
[algo] feat: SAC Performance Improvements for Pi0.5 by @Miical in #5645
[data] feat: optimized text filter process speed on transformer>=5.3.0 and run qw3.5 + aime data by @chenjiaoAngel in #5632
[vllm] chore: fix mc2 used in vllm_ascend on A2 npu by @wucong25 in #5560
[vllm] fix: npu disable flash_attn for RotaryEmbedding by @Mind-s in #5640
[trainer] feat: Add Nemo-Automodel as alternative training engine by @HuiyingLi in #5407
[fully_async] fix: fully_async ckpt save bug by @sl-1314 in #5677
[megatron] fix: add megatron checkpoint patch by @Begunner in #5251
[sglang,fsdp] feat: LoRA support for SGLang rollouts (merge + native adapter paths) by @cavities12 in #5564
[1/n][vllm, rollout] feat: flowgrpo - support vllm-omni as rollout backend for verl by @knlnguyen1802 in #5616
[training_utils] feat: use flash_attn cross_entropy loss in FusedLinearForPPO by @Luosuu in #5662
[rollout,trtllm] fix: trtllm multinode rollout by @hchings in #5693
[fsdp,megatron,vllm,trainer,algo] feat: On-Policy Distillation by @JacobHelwig in #5041
[trainer] fix: fixed an issue where mindspeed's backend context parallelism feature functioned incorrectly by @ji-huazhong in #5697
[megatron] fix: remove llama and qwen2 files by @ChengQianqian in #5707
[trtllm,rollout] fix hang issue from VLM codepath by @hchings in #5701
[vllm] fix: fp8 utils with vllm15 for moe model by @sophiayyya in #5661
[one_step_off] fix: fix one-step-off update weights before rollout finished by @wucong25 in #5698
[megatron, ckpt] fix: set dist_ckpt_optim_fully_reshardable default to False by @koanho in #5705
[fsdp, perf, doc] fix: fix Liger integration for VL models and RL training, allowing liger speed improvement by @EricMarcus-ai in #5669
[perf, trainer, training_utils, ray, worker] fix: Add set_numa_affinity() for engine workers: TrainingWorker. by @sheilaliuxl in #5627
[megatron, vllm] feat: NVFP4 (W4A16) QAT training support via ModelOpt by @jQizhang in #5254
[training_utils] fix: use response_lens.max() instead of offsets().max() for nested tensor max_response_len by @dubin555 in #5699
[fsdp] fix: avoid NestedTensor jagged dim ambiguity for 3D position_ids by @Solus-sano in #5689
[ci] fix: fix various ci failure by @wuxibin89 in #5717
[rollout] fix: enable FP8 quantization for SGLang rollout in fully async mode. by @eternally-z in #5675
[vllm] feat: Add support for the Qwen3_5MoeForCausalLM model On Ascend by @mikequan0425 in #5652
[trainer] fix: skip dataloader state restore when resuming at epoch boundary by @yyZhangAI in #5725
[algo] feat: Implement IcePop in rollout correction by @HollowMan6 in #5722
[fully_async] chore: Add fully async dapo qwen3-30b npu script by @wangshuyang31 in #5653
[model] fix: An end-to-end script for the 235b model is provided for the 256k long sequence by @autbuster in #5733
[model] chore: Corrected the description of errors related to the 235b script and fixed the error in running the sft script. by @autbuster in #5732
[misc] fix: make the assert user-friendly for get_tensordict by @stas00 in #5735
[ci] fix: fix circular import in ci by @vermouth1992 in #5736
[1/2][rollout,trainer] refactor: Teacher colocate mode -- Move teacher logprob computation to AsyncTeacherLLMServerManager by @JacobHelwig in #5723
[misc] fix: supplement the dependencies that are missing in the requirements-npu.txt by @nuerxiati in #5740
[ckpt] fix: handle string task_type in LoRA model merger by @FrankHo-Hwc in #5742
[ci] chore: delete install current repository for npu ci by @yyyy2000 in #5748
[BREAKING][trainer] feat: deprecate legacy engine fsdp and megatron workers by @wuxibin89 in #5604
[2/2][rollout,trainer] feat: Teacher colocate mode by @JacobHelwig in #5745
[trtllm, rollout] fix: partial loading logic by @hchings in #5728
[trainer] fix: convert numpy types to native Python types in MultiTurnSFTDataset by @khazic in #5743
[3/n][reward] feat: flowgrpo - support image-based rewards (rule-based & genrm) by @chenyingshu in #5713
[ci] chore: delete mirror for npu ci by @yyyy2000 in #5758
[fsdp] fix: pass dp_group to prepare_dynamic_batch to fix CUDA deadlock by @JenniferWang in #5591
[megatron] feat: checkpoint save as HF PEFT format by @HollowMan6 in #5575
[doc] refactor: add constraints on the use of vpp and mbridge parameters by @zjchenn in #5763
[fully_async] fix: Patch vllm013 weight loader for qwen3-moe series by @wangshuyang31 in #5695
[hardware, rollout] feat: enable MXFP8 rollout on Ascend 950 devices (DV100 & DV120) by @zhijie-os in #5756
[rollout, tool] feat: support multi-step in skip_rollout v2 by @zyang6 in #5556
[trainer] fix: MLFlow publishing metrics failure should be non-blocking. by @sheilaliuxl in #5771
[ci] chore: add npu nightly ci for dapo-moonlight-16b-megatron and modify log path by @beirong8kmiles in #5734
[trainer] feat: support use_remove_padding=False for mindspeed backend by @ji-huazhong in #5768
[ci] fix: resolve oom when allocating weight transfer buffer in fully async test cases by @0lynnlin0 in #5791
[fsdp, model] feat: add qwen3.5 fsdp grpo training support. by @Zhang1Sheng in #5682
[2/n][rollout] feat: flowgrpo - add diffusion agent loop support by @AndyZhou952 in #5716
[trainer] feat: enable expandable segment support for npu by @ji-huazhong in #5795
[ci] feat: support Ascend A2/A3 docker image build pipeline for sglang by @xiazhahe in #5804
[tool] chore: remove hard-code tool agent loop in fully async by @yyDing1 in #5816
[megatron] feat: support dynamic CP by @ISEEKYAN in #5057
[sglang, rollout] fix: wire up LoRA adapter path for engine_workers + sglang sleep by @cavities12 in #5769
[env] fix: Modify the package installation sequence in the Ascend installation guide by @nuerxiati in #5819
[rollout] fix: processor does not have image_processor. by @SanftMonster in #5823
[single_controller] fix: Set device_name for split_resource_pool to prevent failure on NPU environments by @0oshowero0 in #5824
[megatron] fix: pass use_distributed_optimizer to ddp_config in vanilla mbridge path by @khazic in #5775
[reward] fix: disable signal.alarm() in math_verify to fix silent scoring failure in Ray workers by @farazkh80 in #5635
[megatron] feat: support cp for bshd format by @wuxibin89 in #5826
[megatron, fsdp] feat: DP workload balance for SFT by @arvyanh in #5679
[doc] chore: add news for PyTorch Conference Europe 2026 by @HollowMan6 in #5847
[doc] chore: update README.md by @wuxibin89 in #5850
[misc] feat: add agent instructions, skills & improve CI for easier tests by @tongyx361 in #5846
[rollout, trtllm] fix: add missing init.py to trtllm_rollout package by @Superjomn in #5857
[misc] fix: license for verl/workers/rollout/trtllm_rollout/init.py by @tongyx361 in #5862
[vllm] fix: Fix vLLM synchronization error caused by SGLang skipping resume optimize by @ZLiao097 in #5866
[cfg] refactor: unify ppo_trainer and ppo_megatron_trainer config by @wuxibin89 in #5848
[fully_async] chore: Update fully async dapo qwen3-30b npu script by @wangshuyang31 in #5864
[doc] chore: add npu faq doc by @hustmf in #5871
[docker, ci] fix: all CIs, transformers upgrade to 5.3.0 and vllm==0.18.0 by @ETOgaosion in #5724
[doc] feat: add NVFP4 QAT documentation by @zhangyimi in #5861
[megatron, cfg] feat: add Qwen3.5-122B Megatron launch script by @none0663 in #5874
[tool] feat: verl integrate msprobe data collection by @Tjh-UKN in #5186
[ci] fix: rename fsdp-vlm to megatron-vlm in trtllm cleanup needs by @Superjomn in #5880
[megatron] fix: support critic model by @wuxibin89 in #5870
[trainer] fix: handle empty response_mask in calculate_debug_metrics by @Jackie2049 in #5860
[4/n][trainer] feat: flowgrpo - add diffusers + fsdp engine support by @zhtmike in #5802
[ci] fix: fix machine label for nightly_ascend.yml by @yyyy2000 in #5887
[cfg] fix: sync strategy from ActorConfig/CriticConfig to EngineConfig by @yifannnwu in #5885
[megatron] fix: enable_routing_replay fails with MLATransformerConfig… by @NoonePauseferg in #5884
[model] fix: replace inplace += with out-of-place addition in dummy visual forward by @reonokiy in #5881
[megatron] fix: ValueError when unpacking preprocess_thd_engine result in router replay by @guillemgt in #5891
[trainer] feat: add mindspeedllm backend engine support on NPU. by @pengnuoheng in #5680
[ci, trtllm] test: speed up trtllm CI by using smaller models and reducing test parameters by @shikicloud in #5856
[reward] fix: restore timeout in math_verify via ProcessPoolExecutor by @MaxwellJryao in #5839
[ckpt, trainer] feat: Add plugin hooks for custom CheckpointEngineManager and CheckpointEngine by @NaomiEisen in #5718
[doc] chore: Bug fixes for the qwen3-235b model in 256k scenarios by @autbuster in #5908
[ckpt] fix: load custom_backend_module in CheckpointEngineManager on driver by @yangspirit in #5911
[doc] fix: fix non‑compliant sections by @fh188 in #5913
[megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU by @ZLiao097 in #5904
[rollout] chore: bump up trtllm image version to 1.3.0rc10 by @Superjomn in #5841
[trainer,perf] fix: enable profiler for SFT trainer by @wuxibin89 in #5909
[misc] feat: Update file logger path output to absolute path by @vermouth1992 in #5924
[training_utils, hardware] refactor: standardize deterministic environment variables for NCCL and NPU by @xuy1234 in #5923
[ci] chore: add vllm_ascend.yaml by @Annarine in #5759
Revert "[megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU" by @wuxibin89 in #5942
[vllm] fix: remove redudant clone in weight refit by @wuxibin89 in #5934
[ci] chore: add nightly npu docker for v0.7.1 by @yyyy2000 in #5930
[sglang] fix: Adapting the use of _launch_subprocesses to the latest SGLang branch by @xiazhahe in #5868
[megatron] fix: Adjust the attention mask shape for VLM with Megatron on NPU by @ZLiao097 in #5945
[sglang] fix: sglang empty result problem by @Begunner in #5936
[docker] feat: Add GB200 (aarch64/Blackwell) Docker image and training example by @kaixih in #5596
[trainer] feat: add new trainer with TranferQueue by @wuxibin89 in #5401
[megatron] fix: MTP loss deadlock when using context parallelism by @xhx1022 in #5895
[ci] fix: indentation error in one step off policy e2e ci by @HollowMan6 in #5960
[ci] chore: Update ascend related files code owner by @FightingZhen in #5982
[fully_async]fix: terminated training when streaming_generation raise exception by @Zhikaiiii in #5977
[rollout] fix: prevent engine_kwargs from overwriting KvCacheConfig in trtllm rollout by @Superjomn in #5939
[ci] chore: Add veomni npu ci test by @wangshuyang31 in #5935
[doc] chore: add rloo advantage estimator example script for npu by @zjchenn in #5950
[trainer] fix: return NaN for empty tensors in compute_data_metrics by @Jackie2049 in #5899
[reward] feat: add compute_score timing metrics to agent loop by @Stonesjtu in #5971
[fully_async] feat: enable fully async to log_val_generations by @Begunner in #5988
[ci, vllm] chore: update vllm-omni 0.18.0 official release and Miscellaneous by @AndyZhou952 in #5809
[doc] fix: move low precision doc by @sophiayyya in #5994
[rollout, vllm] fix: auto-convert disable_mm_preprocessor_cache to mm_processor_cache_gb for vllm >= 0.13.0 by @Silas-11 in #5961
[fsdp] feat: qwen3.5 add npu docker file by @ruanhao566 in #5991
[perf] feat: simplify precision_debugger config behavior and docs by @Tjh-UKN in #5986
[doc] feat: move msprobe to ascend_tutorial by @tardis-key in #6004
[misc, fully_async] feat: add Qwen3-VL-8B fully async GRPO training script on geo3k by @Silas-11 in #6006
[megatron] fix: update patch for MLA flashattn forward by @HollowMan6 in #6005
[veomni] feat: bump veomni to v0.1.8 by @deerlu in #5900
[megatron] fix: add missing FP8 padding for router replay by @eternally-z in #5989
[megatron, trainer] fix: respect calculate_entropy config in megatron actor update by @MaxwellJryao in #6016
[ci] chore: add sglang new version docker for NPU by @xiazhahe in #6021
[data] fix: pad data in preprocess_packed_seqs if shorter than align_size by @beirong8kmiles in #6001
[tool, rollout, cfg] feat: per-sample tool environment routing for ToolAgentLoop by @pull-ups in #5978
[fsdp] feat: qwen3.5 modify npu docker file based on CANN 8.5.2 by @ruanhao566 in #6017
[ci] fix: update docker-build-ascend-a3-qwen3_5 by @yyyy2000 in #6022
[fully_async] fix: add fully async grpo qwen3-235b npu script in main branch by @wangshuyang31 in #6012
[veomni] feat: add DeepSeek-V3 to MOE_PARAM_HANDERS by @Luosuu in #5996
[ci] chore: qwen3.5 add docker file add x86 CANN8.5.2 by @ruanhao566 in #6031
[misc] chore: remove deprecated requirements.txt by @wuxibin89 in #6032
[data, trainer] fix: batch padding for multi-trajectory by @ZhentaoFan in #5969
[fully_async] fix: replace routed_experts on partial rollout resume i… by @NoonePauseferg in #6029
[veomni] fix: use local paths for VeOmni model loading by @Luosuu in #6034
[trainer,algo] feat: Support On-Policy Distillation in main_ppo_sync by @0oshowero0 in #5997
[trainer] fix: add missing rollout dump and corrected validation logging in main_ppo_sync by @guillemgt in #6024
[5/n][trainer] feat: flowgrpo trainer by @zhtmike in #5951
[trainer, rollout, algo] refactor: Remove OPD colocate mode by @JacobHelwig in #6039
[rollout] fix: RM sleep/wake teacher replicas by @JacobHelwig in #6041
[fully_async] fix: preserve per-iteration routed_experts on partial rollout resume by @NoonePauseferg in #6046
[misc] fix: project name refactor - volcengine -> verl by @ETOgaosion in #6053
[veomni] feat: support Qwen3.5 SP and add GRPO trainer demo using VeOmniEngine by @deerlu in #6061
[rollout] chore: single turn agent loop also enable rollout trace as tool loop by @pengwu22 in #6048
[trainer,cfg,rollout,algo] feat: Multi-Teacher OPD by @JacobHelwig in #6051
[fully_async] fix: avoid blocking ray.get inside async actor methods by @yxs in #6052
[rollout] feat: add inter-node TRT-LLM rollout support for trtllm by @Superjomn in #5992
[fully_async] fix: Add Mindspeed Patch for Async Training on Ascend NPUs by @HwCARI in #5967
[fully_async, rollout, trainer, tool, cfg] fix: ROCm async training compatibility for AMD MI300X by @xiaohong42 in #6062
[algo] fix: strip '+' suffix in kl_penalty so k3+/low_var_kl+ work by @MaxwellJryao in #6058
[recipe, cfg] fix: update NPU script parameters for Qwen3 GSPO and DAPO math recipes by @zjchenn in #6066
[fully_async] fix: fix rollouter/idle compute in async-mode by @chenjiaoAngel in #6069
[BREAKING] [misc] refactor: deprecate workers, migrate to engines by @ETOgaosion in #6067
[BREAKING] [env] refactor: deprecate verl/interactions by @ETOgaosion in #6074
[veomni] feat: enable VeOmni engine for on-policy distillation by @hjshi84 in #6072
docs: fix typo Hellasage -> HellaSwag by @nowang6 in #6084
[fully_async] Fix: fix fully async profiler for first step. by @Shangwei-Li in #6070
[fully_async] Fix: Remove _fit_torch_memory for separation trainer. by @Shangwei-Li in #6075
[fsdp, perf] fix: skip redundant to(cuda) and gc.collect in train_mode when offload is disabled by @evmanz in #5753
[ci] chore: change some npu ci test yml machine by @yyyy2000 in #6043
[fully_async] fix: allow drain loop to resume early on parameter sync when partial_rollout is enabled by @Begunner in #6090
[fsdp] feat: Qwen3.5 Adds Docker Files Based on CANN8.5.2 A2 by @ruanhao566 in #6098
[trainer] fix: include uid and sort by uid in validation generation dumps in main_ppo_sync by @guillemgt in #6101
[ci] chore: update npu docker build pipeline by @yyyy2000 in #6102
[ci] fix: remove spmd test by @tardis-key in #6103
[docker] chore: update npu v0.7.1 docker pipeline by @yyyy2000 in #6106
[fully_async] Fix: fix megatron save and offload in case param_offload is on by @Shangwei-Li in #6095
[sglang] feat: Patch sglang to support on-policy distillation teacher by @mingruimingrui in #6120
[sglang] feat: restrict abort and resume requests to primary server only by @AkiRusProd in #6109
[trainer] fix: preserve jagged tensor layout when rebuilding nested tensors with same sequence length by @huaiyizhao in #6127
[rollout] feat: improve error messages for malformed tool calls by @xiefan46 in #6055
[model] feat: support qwen35 mtp sft/rl by @zpltys in #5898
[rollout] fix: truncate routed_experts to response_length to match truncated response_ids by @armorbreak001 in #6089
[fully_async, reward] feat: enable GenRM/DisRM support in fully async training by @xiefan46 in #6044
[trainer] fix: support non-last ragged dim in nested tensor rebuild by @xiefan46 in #6149
[repo] refactor: move experimental/vla to standalone verl-vla repository by @Miical in #6162
[doc] fix: update dapo multi model optimization practice by @ChibiQuest in #6161
[algo, fsdp, megatron, cfg] fix: wire up sum_pi_squared for optimal_token_baseline by @startju in #6153
[fully_async] fix: fix custom reward function register on async mode by @chenjiaoAngel in #6166
[trainer] feat: extend CheckpointEngineManager and AgentLoopManager hooks to separation and one-step-off trainers by @NaomiEisen in #6173
[vllm] chore: fix sleep_level in Ascend by @wucong25 in #6170
[trainer] fix: convert numpy arrays to native types before dumping rollout JSONL by @nev8rz in #6167
[doc] fix: update dapo multi model optimization practice by @ChibiQuest in #6169
[doc] chore: fix ascend qucik start by @wucong25 in #6174
[rollout,vllm] feat: split large weight into chunks in NCCL/NIXL checkpoint engine by @wuxibin89 in #6091
[ci] fix: engine_mindspeed_llm_rl_job switch to A3. by @pengnuoheng in #6020
[ci] chore: add sgl_ascend cu for NPU by @Annarine in #6085
[doc] chore: Add DART-GUI project to README by @Pengxiang-Li in #6192
[worker] fix: optimzers doesn't have to be on same device as model by @HollowMan6 in #6196
[trainer] fix: remove actor.dump_memory_snapshot by @tardis-key in #6198
[worker] fix: grad_norm as non-tensor for metrics by @HollowMan6 in #6195
[BREAKING][misc] refactor: migrate Diffusion RL stack to verl-omni by @SamitHuang in #6200
[fsdp] fix: honor mixed_precision.param_dtype in forward_step autocast (#5932) by @shivam2199 in #6150
[BREAKING][rollout] refactor: move LLMServerManager out of AgentLoopManager by @wuxibin89 in #6129
[ci] chore: bump trtllm to 1.3.0rc13 and verl to v0.7.1 by @Superjomn in #6215
[misc] refactor: re-format examples and deprecate old examples by @ETOgaosion in #6126
[megatron] fix: avoid 2x peak host memory on Megatron model offload by @acmore in #6193
[misc] fix: add missing init.py files to package directories by @guillemgt in #5209
[veomni] feat: use VeOmni's native return_log_probs path to compute log_probs by @Luosuu in #6184
[reward] refactor: refactor RM score assembly in reward loop by @zhtmike in #6242
[tool] fix: In the memory snapshot collection logic, opening history records is not compatible with NPU by @shaanjiangcun in #6216
[megatron] fix: fix seq_len pad len, and adapt to new mtp_loss api (for megatron dev brance) by @zpltys in #6206
[ci] chore: remove fastmcp deps for sglang ci by @wuxibin89 in #6249
[trainer] fix: dump all outputs in validation in main_ppo_sync by @guillemgt in #6227
[rollout,vllm] fix: include Ray job id in colocated weight-transfer IPC path by @timothygao8710 in #6246
[trainer] fix: update TorchTitanEngine for latest torchtitan API by @acisseJZhong in #6231
[reward] fix: compute correct rollout world size by @guillemgt in #6226
[fully_async, rollout] feat: enable online policy distillation in fully async training by @xiefan46 in #6056
[sglang] feat: SGLang Prefill-Decode disaggregated rollout by @yxs in #6117
[rollout] fix: guard sglang profiling when self.tokenizer_manager is None by @LeiDing191 in #6217
[tool] feat: Memory snapshot collection, add functionality to clear history after collection. by @shaanjiangcun in #6248
[doc] add RandOpt to readme awesome work by @sunrainyg in #6257
[rollout] fix: trtllm rollout docker image and a few scripts by @hchings in #6230
Revert "[reward] fix: compute correct rollout world size" by @wuxibin89 in #6258
[doc] feat: add code reviewer by @ArronHZG in #6260
[trainer] fix: write request_id to reward_extra_infos_to_dump instead of reward_extra_infos_dict by @boundless-future in #6251
[ci] fix: remove the rebundant config by @yyyy2000 in #6253
[rollout] feat: enable Async RL for trtllm rollout by @hchings in #5631
[ci] chore: bump trtllm to 1.3.0rc14 and pin mbridge by @Superjomn in #6262
[veomni] feat: add Qwen3.5-122b-a10b GRPO trainer demo with EP enabled using VeOmniEngine by @deerlu in #6264
[docker] feat: bump aarch64 vllm 0.17->0.18 by @kaixih in #6222
[tool] feat: simpler function-based tool registration by @Begunner in #6189
[doc] fix: router_replay is now under megatron by @HollowMan6 in #6272
[reward, cfg] fix: correctly use RewardModelConfig in reward config by @guillemgt in #6265
[ci] chore: slim down TRT-LLM CI by @Superjomn in #6275
[doc] fix: correct module path in fully_async_policy documentation by @GJWu-zyx in #6287
[megatron] fix: fix bugs when using position_ids in cp by @Kite0011 in #6267
[ci] feat: add gspo qwen3-30b in nightly npu ci by @yyyy2000 in #6273
[rollout] fix: skip_tokenizer_init=True for OPD teacher by @wuxibin89 in #6296
[tool] refactor: tools will be initialized in AgentLoopWorker by @Begunner in #6300
[reward, trainer] feat: support multi-output trajectories in async reward scoring by @guillemgt in #6228
[BREAKING][tool, data] refactor: remove curriculum sampler + dynamic dataset + tool examples by @Begunner in #6302
[model] chore: refactor npu scripts by @wucong25 in #6285
[doc] chore: update vllm and vllm ascend from 0.13.0 to 0.18.0 in docs and dockerfile by @wangshuyang31 in #6291
[megatron] fix: the NPU error that occurs after migrating from megatron worker to engine worker. by @xiazhahe in #6135
[trainer] fix: combine REMAX sampled and greedy samples in one rollout request by @liziniu in #6308
[vllm] refactor: MXFP8 support for ascend NPU by @quancs in #6307
[megatron] feat: support Megatron-FSDP mode for Megatron backend by @conver334 in #5423
[ci] chore: bump trtllm CI image to 1.3.0rc14 by @Superjomn in #6269
[fully_async] feat: reuse trainer worker group for hybrid rollout to do validation by @ArronHZG in #6076
[misc] refactor: re-format npu examples by @beirong8kmiles in #6286
[ci] chore: add sglang ci for NPU by @xiazhahe in #6015
[worker] feat: support log memory in engine worker by @yyyy2000 in #6270
[doc] refactor: ascend doc refactor of precision guide and dockerfile build guidance by @yyyy2000 in #6298
[data] fix: forward apply_chat_template_kwargs to system prompt measurement by @MohammadShahdad in #6305
[fsdp] fix: FSDP2 silently drops fsdp_config.forward_prefetch by @memset0 in #6317
[megatron] fix: fix bug with mcore0.12.1 + torch2.9.0 by @yyyy2000 in #6322
[data, rollout] feat: add audio data support by @SanftMonster in #6276
[doc] refactor: Ascend docs rectification, add parameter and metrics descriptions by @nuerxiati in #6294
[doc] refactor: Ascend docs rectification, add new FAQ questions by @nuerxiati in #6328
[model, fsdp] fix: Fix modeling_qwen2_5_vl missing attribute 'Qwen2RMSNorm' by @ZhuYajun-AI in #5901
[fsdp, fully_async] feat: add Qwen3-VL-30B-A3B fully async GRPO training script on geo3k by @zhihaofang1017 in #6131
[fsdp, fully_async] fix: fix CI import fast_pos_embed_interpolate in Qwen3-VL by @zhihaofang1017 in #6332
[docker,vllm] feat: Enable DeepEP in ARM stable image by @kaixih in #6326
[misc] chore: Update for vexact release by @pengwu22 in #6336
[model, fsdp] fix: honor SP-rolled labels in fused kernels (#6068) by @shivam2199 in #6268
[fsdp, ckpt] fix: drop tied target keys before HF save_pretrained by @ChangyiYang in #6334
[fsdp] fix: lenient resolution of _no_split_modules in get_fsdp_wrap_policy by @SteadfastAsArt in #6290
[doc] chore: split install guidance and quickstart by @Mengyuyang in #6337
[trainer] feat: support ReMax in synchronous TransferQueue trainer by @liziniu in #6340
[doc] chore: add npu advanced features by @wucong25 in #6339
[doc] refactor: added the document that collects statistics on models and algorithms that support the NPU by @zhouhengan1211 in #6347
[ci] fix: change model_path to local cache dir by @wuxibin89 in #6351
[fsdp] fix: build no-padding attention mask from input ids by @anzhsoft in #6345
[fsdp] fix: emit distillation outputs in use_remove_padding=False path (#6293) by @abinggo in #6350
[tool] fix: tool response truncate side by @haoyang9804 in #6313
[algo] fix: vectorized grpo low-variance scaling by @haoyang9804 in #6348
[doc] chore: verl Ascend doc refactor by @hustmf in #6353
[doc] chore: NPU model migration guidance by @Mind-s in #6330
[doc] chore: fix ascend doc link by @hustmf in #6359
Revert "[fsdp] fix: emit distillation outputs in use_remove_padding=False path (#6293)" by @wuxibin89 in #6360
[fsdp, ckpt] chore: fold drop_tied_target_keys into top-level import by @ChangyiYang in #6356
[doc] chore: OPD docs by @JacobHelwig in #6358
[hardware] add DT flops by @brook-cpp in #6363
[doc] fix: delete sglang_multiturn by @xvlincaigou in #6379
[trainer,data] fix: Support merging extra_info for main_ppo_sync & update TQ dependency by @0oshowero0 in #6354
[trainer] feat: add set_expandable_segments support for npu by @ji-huazhong in #6346
[trainer] feat: deprecate main_ppo.py warning by @wuxibin89 in #6384
[trainer] feat: async generation dump with exception propagation and streaming write by @Jackie2049 in #6324
[doc] chore: announce VeRL-Omni pre-release in README News by @SamitHuang in #6390
[doc] refactor: update rocm doc by @mingjielu in #6388
[fsdp] fix: emit distillation outputs in use_remove_padding=False path by @abinggo in #6386
[ci] fix: use TRTLLM_TEST_MODEL_PATH_ROOT in test_trtllm_rollout_utils by @Superjomn in #6385
[misc] fix: use directory-symlink layout for shared skills by @tongyx361 in #6391
[veomni] feat: Add GRPO training scripts for Qwen3-VL-30B-MOE (VeOmni Backends) by @phdddd in #5275
[veomni] feat: add veomni qwen3-30b and fix ep by @wangshuyang31 in #6323
[doc] chore: Update Ascend Docker build guidance by @anzhsoft in #6399
[megatron, cfg] feat: add Qwen3.5-35B Megatron-Bridge launch script on Ascend by @Zhang1Sheng in #6318
[perf, hardware] feat: NPU supports Liger-Kernel by @zheliuyu in #6244
[fsdp] feat: Support zero2 optional feature for FSDP1 in engine worker. by @ZLiao097 in #6410
Update installation instruction for TransferQueue by @emmericp in #6420
[tool] feat: add Gemma4 tool parser with stop token and response formatting by @nanastassacos in #6406
[fsdp] fix: sort buffers in fsdp2_load_full_state_dict to prevent NCCL deadlock with heterogeneous buffers by @nanastassacos in #6405
[megatron] chore: refactor to use Megatron-Bridge new APIs by @HollowMan6 in #6335
[doc] chore: Update ascend doc link in README.md by @hustmf in #6427
[megatron] fix: set use_mbridge to True for some npu scripts by @zjchenn in #6429
[doc] fix: fix ascend dockerfile_build_guidance as issue by @yyyy2000 in #6400
[misc] fix: device variable not bound in some scripts by @zjchenn in #6430
[megatron, cfg] feat: add Qwen3-VL-30B mbridge launch script on Ascend by @Seren-hao in #6443
[docker] chore: update vllm 0.20.2 image by @ETOgaosion in #6393
[sglang, one_step_off] fix: add free_cache_engine guard to resume_kv_cache by @dafu-wu in #6442
[ckpt, model] fix: save LoRA train metadata for PPO actor checkpoint by @Yatogaii in #6409
[fully_async] fix: initialize _dump_executor in FullyAsyncTrainer and FullyAsyncRollouter by @nanastassacos in #6438
[ci] fix: the qwen3 model replaces the qwen25 model by @daikang6 in #6398
[rollout] fix: fix fp8 for async RL and multinode rollout by @hchings in #6344
[veomni] feat: add VeOmni-native critic support by @Luosuu in #6453
[fully_async,doc] feat: rm future plans, almost all completed. by @ArronHZG in #6457
[rollout] feat: add Trackio rollout trace logging by @abidlabs in #6423
[model] refactor: clean up outdated Qwen2_5_vl code implementation by @ji-huazhong in #6445
[megatron,rollout] fix: align MTP loss and rollout metrics by @xhx1022 in #6432
[docker] fix: align stable vllm ARM version with x86 by @kaixih in #6460
[ci, trainer] fix: fix code, scripts and st for mindspeedllm backend by @pengnuoheng in #6316
[fsdp] fix: device mismatch between fsdp2 offload and weights transfer by @ETOgaosion in #6463
[ci] chore: update npu docker to cann 9.0.0 by @yyyy2000 in #6466
[veomni] feat: wire MoE load-balance monitor into VeOmni engine by @Luosuu in #6470
[fully_async] feat: fully async profiling by @tardis-key in #6461
[fsdp, megatron, trainer] feat: add top-k distillation overlap metrics by @Turingzero0 in #6469
[trainer, rollout, cfg] feat: add extension points for custom worker configs by @Luosuu in #6489
[veomni] fix: VeOmniEngineWithValueHead loads ForTokenClassification by @Luosuu in #6488
[trainer] fix: gracefully shutdown trainer with TransferQueue by @wuxibin89 in #6491
[veomni] feat: add MoE router replay (R2/R3) support by @hjshi84 in #6325
[fsdp, model] feat: support qwen3_5 ulysses sp by @SaltFish11 in #6482
[ci] chore: requirements-npu add triton-ascend by @wucong25 in #6493
[rollout, vllm] fix: use engine.sleep() instead of collective_rpc by @dafu-wu in #6456
[ci] chore: triton-ascend==3.2.1 need install path in docker by @yyyy2000 in #6498
[docker] chore: upgrade sglang to 0.5.12 by @ETOgaosion in #6435
[rollout, vllm] fix: treat null rollout seed as 0 for engine init by @SamitHuang in #6503
[fsdp] fix: add sp and use_remove_padding validation for SFT and RL in fsdp engine by @fisherxu in #6502
[rollout] feat: enable MooncakeStoreConnector with hard-reset on weight update by @aoshen02 in #6373
[megatron] feat: ascend bump into megatron 016 by @wangshuyang31 in #6374
[megatron, trainer] fix: preserve BSHD top-k distillation shape by @anzhsoft in #6506
[hardware] chore: remove redundant Dockerfile.rocm7 by @Vivicai1005 in #6514
[cfg] fix: align dataclass defaults with yaml by @anzhsoft in #6494
[doc] feat: add uni-agent release to readme by @yyDing1 in #6516
[ci] fix: Update Dockerfile.ascend_9.0.0_a3 by @yyyy2000 in #6517
[ci] fix: remove the uninstall of triton in ascend docker by @yyyy2000 in #6524
[ci] chore: add npu sglang nightly ci by @hustmf in #6521
[veomni, fsdp] feat: enable fused top-K distillation kernel for OPD by @Luosuu in #6511
[doc] chore: point AMD ROCm section at amd_quick_start.rst by @Vivicai1005 in #6518
[ckpt] feat: pass global steps to checkpoint engines by @athreesh in #6507
[doc] refactor: optimize ascend doc by @hustmf in #6532
[doc] chore: fix verl ascend readme by @wucong25 in #6534
[ci] chore: npu ci use cann9.0.0 by @daikang6 in #6520
[veomni] fix: VeOmniEngineWithValueHead token-cls lookup default value for transformers v5 by @Luosuu in #6540
[model] fix: support trl>=0.29 AutoModelForCausalLMWithValueHead import by @wenzhaoabc in #6539
[ci] fix: FSDP actor/critic ci fail by @wuxibin89 in #6550

New Contributors

@Gary-cjy made their first contribution in #5215
@AnikiFan made their first contribution in #5550
@Fan-Yunfan made their first contribution in #5464
@HuiyingLi made their first contribution in #5407
@knlnguyen1802 made their first contribution in #5616
@Luosuu made their first contribution in #5662
@koanho made their first contribution in #5705
@Solus-sano made their first contribution in #5689
@yyZhangAI made their first contribution in #5725
@stas00 made their first contribution in #5735
@FrankHo-Hwc made their first contribution in #5742
@chenyingshu made their first contribution in #5713
@JenniferWang made their first contribution in #5591
@zhijie-os made their first contribution in #5756
@zyang6 made their first contribution in #5556
@beirong8kmiles made their first contribution in #5734
@0lynnlin0 made their first contribution in #5791
@Zhang1Sheng made their first contribution in #5682
@AndyZhou952 made their first contribution in #5716
@SanftMonster made their first contribution in #5823
@farazkh80 made their first contribution in #5635
@Tjh-UKN made their first contribution in #5186
@Jackie2049 made their first contribution in #5860
@zhtmike made their first contribution in #5802
@yifannnwu made their first contribution in #5885
@NoonePauseferg made their first contribution in #5884
@reonokiy made their first contribution in #5881
@pengnuoheng made their first contribution in #5680
@shikicloud made their first contribution in #5856
@MaxwellJryao made their first contribution in #5839
@NaomiEisen made their first contribution in #5718
@yangspirit made their first contribution in #5911
@fh188 made their first contribution in #5913
@xuy1234 made their first contribution in #5923
@kaixih made their first contribution in #5596
@Zhikaiiii made their first contribution in #5977
@Stonesjtu made their first contribution in #5971
@ruanhao566 made their first contribution in #5991
@deerlu made their first contribution in #5900
@pull-ups made their first contribution in #5978
@ZhentaoFan made their first contribution in #5969
@yxs made their first contribution in #6052
@HwCARI made their first contribution in #5967
@xiaohong42 made their first contribution in #6062
@nowang6 made their first contribution in #6084
@evmanz made their first contribution in #5753
@AkiRusProd made their first contribution in #6109
@xiefan46 made their first contribution in #6055
@armorbreak001 made their first contribution in #6089
@startju made their first contribution in #6153
@nev8rz made their first contribution in #6167
@Pengxiang-Li made their first contribution in #6192
@SamitHuang made their first contribution in #6200
@shivam2199 made their first contribution in #6150
@acmore made their first contribution in #6193
@shaanjiangcun made their first contribution in #6216
@timothygao8710 made their first contribution in #6246
@LeiDing191 made their first contribution in #6217
@sunrainyg made their first contribution in #6257
@boundless-future made their first contribution in #6251
@GJWu-zyx made their first contribution in #6287
@MohammadShahdad made their first contribution in #6305
@memset0 made their first contribution in #6317
@ZhuYajun-AI made their first contribution in #5901
@SteadfastAsArt made their first contribution in #6290
@Mengyuyang made their first contribution in #6337
@zhouhengan1211 made their first contribution in #6347
@anzhsoft made their first contribution in #6345
@abinggo made their first contribution in #6350
@haoyang9804 made their first contribution in #6313
@brook-cpp made their first contribution in #6363
@mingjielu made their first contribution in #6388
@phdddd made their first contribution in #5275
@emmericp made their first contribution in #6420
@nanastassacos made their first contribution in #6406
@dafu-wu made their first contribution in #6442
@abidlabs made their first contribution in #6423
@Turingzero0 made their first contribution in #6469
@SaltFish11 made their first contribution in #6482
@fisherxu made their first contribution in #6502
@aoshen02 made their first contribution in #6373
@Vivicai1005 made their first contribution in #6514
@athreesh made their first contribution in #6507
@wenzhaoabc made their first contribution in #6539

Full Changelog: v0.7.1...v0.8.0