v0.7 release

Blog post: verl 0.7 release blog

Highlight

Model Engine

Integrate Megatron-Bridge and support LoRA/PEFT, see blog post: How We Build Trillion Parameter Reasoning RL with 10% GPUs
Support experimental fp8 training for megatron backend
Support new model for megatron backend: GPT-OSS, Qwen3-Next
Comprehensive support for new mode engine, FSDP and Megatron engine are production ready.
- Dispatch tensordict with nested tensor instead of padded DataProto
- Add TrainingWorker that resembles Tinker-like API
- Add VLM support for model engine, SFT and RL trainer
- Add model engine based critic model
- Implement ActorRolloutRefWorker by TrainingWorker, support different backend in one worker
New VeOmni engine added, still in alpha status.

Rollout Engine

Remove SPMD rollout mode
Support blockwise fp8 rollout for vllm and sglang; support online quant for vllm with torchao
Experimental router replay support for vllm
Optimize multi-modal data fetch and preprocess, support video input
Upgrade to vllm==0.12.0; sglang==0.5.6

Reward

Support hybrid reward scenarios, including generative, discriminative, rule-based rewards, and their combinations.
Refactor reward models into server mode, supporting both colocated and standalone deployments.
Introduce new reward managers to handle more complex scenarios, limited mode for request rate control and remote mode for CPU-intensive tasks.

Algorithm

Add CISPO: Clipped IS-weight Policy Optimization
Add SAPO: Soft Adaptive Policy Optimization

Recipe

[NEW] VLA: add experimental support for VLA model
[NEW] rhymerl: History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
TransferQueue: support multiple data partition and optimize tensor zero-copy serialization
One-step-off-policy/Fully async: optimize weight synchronization by checkpoint engine with bucket and pipeline support.

What's Changed

[data] fix: MultiturnSFTDataset handle messages with list args in tool call by @gongyisheng in #4125
[ci, doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version by @FightingZhen in #4123
[data] fix: fix global_seqlen metric by @conver334 in #4129
[ci] fix: Optimize ascend docker build workflow and dockerfile to solve OOM problem by @FightingZhen in #4137
[ci] fix: fix error limiting MindSpeed cloning depth to one by @FightingZhen in #4140
[ci] feat: specify torch and torch_npu version into ascend dockerfile by @FightingZhen in #4141
[ci] fix: move torch and torch_npu install order in ascend dockerfile to ensure installed version correct by @FightingZhen in #4142
[ci] fix: Correct version relationship between torch and torchvision in ascend dockerfile by @FightingZhen in #4143
[doc] chore: Add one_step_off_policy support doc of Ascend NPU by @baymax591 in #4151
[rollout] fix: resource pool name in standalone mode by @PeterSH6 in #4149
[ci] feat: Update e2e_ascend CI image to 8.3.RC1 version, remove weekly validation workflow by @FightingZhen in #4146
[doc] chore: add pytorch conference materials by @hongpeng-guo in #4161
[rollout] fixup load_format=dummy update_weights not do process_weight… by @Annarine in #4130
[vllm] fix: Change parameter validation to align with vllm validation by @HelloWorldBeginner in #4153
[trainer] fix: reproducible problem when resume training by @wlhgtc in #4156
[recipe, tool] feat: support multi-turn and tool call for recipe/fully_async_policy by @sl-1314 in #4067
[cfg] fix: add rollout_correcton config field with omegaconf.open_dict by @tongyx361 in #4167
[doc] fix: Misc doc fixes by @kerrickstaley in #4171
[recipe] feat: add qwen3 8b grpo one_step_off_policy script on ASCEND NPU by @baymax591 in #4163
[BREAKING][rollout] feat: change rollout to server mode by default by @wuxibin89 in #4106
[algo] feat: Add RateLimitedRewardLoopManager with three-layer rate limiting for API-based rewards by @JoyboyBrian in #4107
[megatron] feat: load dist checkpoint with customized prefix for state dict keys. by @shevateng0 in #4139
[megatron] fix: Use tokenizer path or model path in config by @ashvinnihalani in #4091
[doc] chore: update docker installation guide by @wuxibin89 in #4155
[recipe] feat: DeepSeek-R1-Zero on Ascend NPU by @johnjunjun7 in #3427
[recipe] fix: compatibility with vLLM Qwen3Next model by @zjchenn in #4184
[recipe] fix: readme in recipe/r1_ascend by @HzZHoO in #4183
[recipe] fix: ReactAgentLoop error handling for failed LangGraph invocations by @le-czs in #4182
[ci] chore: Update e2e_ascend CI trigger policy by @FightingZhen in #4189
[recipe] fix: Qwen3-vl npu patch by @leisuzz in #4186
[rollout, doc] feat: limit tracing samples by @EricMarcus-ai in #4185
[worker, sglang] fix: Rename the file sglang_router.py to avoid circular imports by @Shiguang-Guo in #4187
[megatron, recipe] fix: error of megatron init while detached actor and rollout by @lalala-2 in #4179
[ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI by @wlf-darkmatter in #3465
[rollout, vllm] feat: support blockwise fp8 rollout by @Agoniii in #3519
[ci] feat: Move hf_transfer dependency to requirement file by @FightingZhen in #4210
[misc] feat: init random model supports custom code in the model by @HollowMan6 in #4217
[single_controller] feat: support dispatch tensordict by @vermouth1992 in #4213
[recipe, doc, ckpt] fix: error of ckpt in fully async by @lalala-2 in #4199
[megatron] feat: FP8 training by @ISEEKYAN in #4223
[megatron] feat: moe fp16 training by @HaochenYuan in #4158
[recipe] fix: incorrect reward function in fapo scripts by @yyDing1 in #4195
[rollout, vllm] feat: support blockwise FP8 rollout for vLLM v0.11 MoE RL by @jQizhang in #4222
[single_controller] feat: support multiple replicate worker in one resource pool by @yyDing1 in #4226
[megatron] fix: BF16 mode should use PAO as well by @ashvinnihalani in #4221
Revert "[megatron] fix: BF16 mode should use PAO as well" by @ISEEKYAN in #4234
[doc] feat: Add Search Self-Play to awesome work list by @Necolizer in #4245
[worker] feat: add support for colocate replicas by @yyDing1 in #4233
[trainer] feat: refactor workers with model engine by @wuxibin89 in #4211
[single_controller] feat: support resource pool split method by @yyDing1 in #4251
[recipe] fix: tighten async rollouter task handling by @le-czs in #4230
Revert "[single_controller] feat: support resource pool split method" by @vermouth1992 in #4258
Revert "[worker] feat: add support for colocate replicas" by @wuxibin89 in #4259
Revert "[single_controller] feat: support multiple replicate worker in one resource pool" by @vermouth1992 in #4260
[ci] fix: Fix triton-ascend unavailable error in Ascend dockerfile by @FightingZhen in #4254
[ci] fix: Fix error in ascend dockerfile by @FightingZhen in #4265
[rollout] fix: ensure weight sync regardless of free_cache_engine by @JobQiu in #4248
[doc] feat: add rollout&train consistency doc for Ascend Platform by @momo609 in #4166
[recipe] feat: allow customize agent name by @vermouth1992 in #4269
[ci] fix: Remove redundant uninstall command in e2e_ascend by @FightingZhen in #4267
[megatron] Fix: fix bugs in mcore backend context-parallel code logic by @Kite0011 in #4250
[recipe] feat: add Experimental VLA RL Support by @The-Hierophant in #3918
[recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller by @LLLLxmmm in #4175
[ci] feat: Increase e2e_sft timeout from 25 to 30 minutes by @vermouth1992 in #4279
[megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT by @HollowMan6 in #4063
[single_controller] feat: support resource_pool split by @yyDing1 in #4273
[recipe] feat: move recipes to new repository verl-recipe by @wuxibin89 in #4283
[worker] feat: restore colocate workers based on new splited resource pool by @yyDing1 in #4282
[misc] feat: Add actor_rollout_ref.actor.calculate_entropy for entropy fwd by @EduardDurech in #4239
[trainer] feat: Self-Normalized Importance Sampling by @EduardDurech in #3980
[ci, megatron] fix: add rotary_pos_cos_sin to forward by @HollowMan6 in #4291
[megatron] fix: pass trust_remote_code to get_generation_config by @jprellberg in #4196
[misc] fix: support nested datastructure in dataproto to convert to tensordict by @PeterSH6 in #4296
[ci] fix: use local hf model path by @wuxibin89 in #4299
[data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix by @0oshowero0 in #4289
[recipe] feat: support reward_loop for recipe/fully_async_policy by @sl-1314 in #4224
[misc] fix: fix list conversion in get_tensordict by @PeterSH6 in #4304
[hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. by @ji-huazhong in #4309
[rollout] fix: some compatibility changes in agent loop and reward by @pengwu22 in #4293
[worker] fix: do not pass router address and tokenizer is their value is none by @yyDing1 in #4310
[doc] chore: Update ascend quickstart doc by @FightingZhen in #4321
[misc] feat: add more utils of tensordict by @vermouth1992 in #4322
[recipe] fix: Fixed scripts for one_step_off_policy async not implemention by @baymax591 in #4350
[model] feat: refactor engine folder structure by @vermouth1992 in #4352
[recipe] feat: move char count recipe to verl-recipe by @vermouth1992 in #4351
[ci] chore: switch ascend ci calculation resource by @FightingZhen in #4347
feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode by @szrlee in #4360
[misc] refactor: clean up unused sharding managers by @ji-huazhong in #4361
[worker] feat: Add TrainingWorker that resembles Tinker-like API by @vermouth1992 in #4371
[vllm] fix: Fix issues that occur during the ACLGraph initialization process in the NPU. by @chengminhua in #4209
[megatron] feat: support gpt-oss by @ISEEKYAN in #4323
[megatron] fix: megatron async save ckpt fix by @Leem-Li in #4253
[misc] feat: Update news section in README.md by @vermouth1992 in #4385
[misc] fix: handle empty TensorDict in DataProto serialization by @le-czs in #4379
[trainer,fsdp] feat: enable reproducibility for training by @ji-huazhong in #4378
[trainer] feat: support ray-based sft trainer by @vermouth1992 in #4382
[megatron] feat: optimize the mbridge checkpoint saving speed by @ISEEKYAN in #4386
[rollout] feat: add support for discriminative reward model in reward loop by @yyDing1 in #4358
[recipe] feat: refactor one step off to support server mode by @ArronHZG in #4307
[misc] feat: support TensorDict in DataProtoFuture by @vermouth1992 in #4395
[fsdp] fix: Fixing the error caused by empty tensors in the multi_turn + remove_padding scenario by @nuerxiati in #4165
[doc] fix: add Geo-RS-Seq-TIS estimators and update documentation by @szrlee in #4359
[worker] feat: custom master addr port by @tongyx361 in #4389
[doc] feat: update reward loop document by @yyDing1 in #4404
[algo] feat: support router replay by @litianjian in #4101
[recipe] fix: FlowRL actor to pure implementation by @Xuekai-Zhu in #4397
[doc] feat: add more user instructions to reward loop doc by @yyDing1 in #4409
[doc] feat: add OneThinker link in readme by @appletea233 in #4410
[ci] fix: NPU not support router replay by @wuxibin89 in #4414
[worker] feat: custom reward_manager by @tongyx361 in #4387
[vllm] feat: retires vllm spmd mode in the codebase by @PeterSH6 in #4411
[sglang] fix: HTTP server startup issues for Prometheus and Grafana integration by @jsfanfanfan in #4408
[doc] chore: Update ascend quickstart and docker build guidance doc by @FightingZhen in #4420
[sglang] feat: retires sglang spmd mode in the codebase by @PeterSH6 in #4422
[fsdp] feat: update NPU fused kernels for Qwen3 moe block by @icerain-alt in #4406
[misc] refactor: clean up unused sharding manager by @ji-huazhong in #4439
[hardware] chore: clean npu_patch by @FightingZhen in #4436
[misc] fix: fix memory leakage when initializing multiple tools by @PeterSH6 in #4430
[trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe by @moehanabi in #3975
[misc] feat: optimize performance of index_select_tensor_dict by @vermouth1992 in #4444
[ci] test: Disable ReMax training test in vllm workflow by @PeterSH6 in #4445
[rollout] fix: RolloutConfig should support repetition_penalty config… by @Lokiscripter in #4398
[recipe] feat: add fully async comm between rollout and sim node in disagg mode by @HanlinDu in #4433
[misc] feat: optimize nested tensor index by @vermouth1992 in #4447
[model] feat: add qwen3-4b grpo script on ASCEND NPU A3 by @5082459 in #4432
[megatron] fix: Remove Deprecated Megatron Optimizer Args by @DaizeDong in #4396
[megatron] fix: respect use_distributed_optimizer in config by @HollowMan6 in #4392
[recipe, ci] fix: remove batch mode for remote generative reward model by @yyDing1 in #4448
[misc] feat: optimize rearrange_micro_batches by @vermouth1992 in #4451
[rollout, sglang] feat: support blockwise fp8 rollout by @Agoniii in #4415
[trainer] feat: model engine sft trainer support vlm model by @wuxibin89 in #4403
[trainer] feat: add reward loop config to default config by @yyDing1 in #4452
[vllm] feat: support abort generating requests in vllm server by @PeterSH6 in #4453
[ci] chore: cleanup some ci workflow by @wuxibin89 in #4459
[trainer] feat: allow override for reward_manager_worker in agent loop by @ryxli in #4423
[model] feat: enhances TrainingWorker by @vermouth1992 in #4461
[recipe] feat: Modify the way of obtaining default_runtime_env by @xichengpro in #4468
[rollout] fix: mlflow consecutive slashes by @BaiqingL in #4446
[fsdp] fix: reward model also reads override config attn_implementation by @pengwu22 in #4458
[vllm] fix: compatible to vllm0.12 by @ISEEKYAN in #4473
[model] feat: support manual control load/offload by @vermouth1992 in #4472
[ci] feat: Update e2e_ascend to improve CI execution efficiency by @FightingZhen in #4477
[ci] fix: Fix e2e_ascend sft test case error by @FightingZhen in #4481
[trainer] feat: support moving ppo actor logics to single controller by @vermouth1992 in #4480
[megatron] fix: correct typo in modeling_qwen2_megatron.py by @study8677 in #4486
[fsdp] fix: qwen3vlmoe with Monkey patch to fix a bug in transformers 4.57.x by @pengyanai in #4402
[ci] fix: fix format check error by @ji-huazhong in #4506
[hardware] feat: Auto set device_name to npu for Ascend NPU by @FightingZhen in #4489
[trainer] feat: make reward loop disrm default by @yyDing1 in #4466
[algo,doc] refactor: rollout correction by @szrlee in #4511
[trainer] feat: enable model engine based critic by @vermouth1992 in #4507
[vllm, rollout] feat: support reset prefix cache after abort by @PeterSH6 in #4519
[ci] chore: remove proxy settings in e2e_ascend by @FightingZhen in #4527
[rollout] fix: correct heap-based load balancing in AsyncLLMServerManager by @hellcatCS in #4505
[sglang, rollout] feat: delete remaining sglang spmd code by @PeterSH6 in #4523
[data] feat: TransferQueue - Add zero-copy serialization support & usage improvement by @0oshowero0 in #4429
[rollout] feat: pass agent_data to tool calling by @wuxibin89 in #4469
[megatron,ci] chore: update instructions and scripts for LoRA by @HollowMan6 in #4533
[megatron] chore: clean legacy code path part 1, make engine use mbridge by default by @ISEEKYAN in #4528
[megatron] chore: clean legacy code path part 2, clean legacy CI by @ISEEKYAN in #4529
[trainer] fix: model engine vlm multi_modal_inputs to NonTensorStack by @wuxibin89 in #4492
[ray] chore: Update Ray version dependency in requirements-npu.txt by @FightingZhen in #4543
[ci] chore: migrate all rm related ci to reward loop by @yyDing1 in #4520
[algo] fix: Add seq mean mask denominator option by @szrlee in #4510
[trainer] fix: change name for reward loop worker override by @ryxli in #4549
[rollout,vllm] feat: disable sleep mode in fully-async mode by @chenjiaoAngel in #4521
[rollout, trainer] feat: extend agent loop for custom implementations by @JoyboyBrian in #4548
[rollout] chore: update reward loop file names by @yyDing1 in #4547
[ci] fix: Add mbridge dependency into e2e_ascend by @FightingZhen in #4560
[doc] feat: add JupyterLab plugin instructions by @yqsstudy in #4536
[ci] feat: Increase e2e_sft timeout from 30 to 40 minutes by @vermouth1992 in #4552
[misc] chore: add "reward" tag to PR template by @yyDing1 in #4573
[BREAKING][recipe, ckpt] feat: support parameter sync by checkpoint-engine. only for fully_async mode. by @zpltys in #4427
[training_utils] fix: fix model enum acquire logic error in registry by @FightingZhen in #4577
[megatron] feat: add script for qwen3next training by @ISEEKYAN in #4582
[ci] fix: exclude FSDP-related source files from Megatron CI by @zzhbrr in #4574
[reward,ci] fix: cast by @tongyx361 in #4594
[vllm] feat: TensorLoRARequest support newer vLLM versions by @HollowMan6 in #4606
[misc] feat: always use robust get_event_loop by @tongyx361 in #4603
[trainer] feat: Implemented VeomniEngine as a alternative training backend by @A1waysBeenHere in #4072
[perf] fix: modify the NPU profiler default configuration by @tardis-key in #4475
[megatron] feat: support discrete profiling for mindspeed by @tardis-key in #4271
[doc] chore: update LoRA docs with megatron guidelines by @HollowMan6 in #4565
[reward] feat: Optimize reward computation when use_reward_loop=True by @none0663 in #4581
[rollout] chore: rename reward loop class name and update ci by @yyDing1 in #4572
[log] fix: fix wandb log validate run error on async-tool by @chenjiaoAngel in #4591
[sglang] fix: warmup_thread_args->warmup_thread_kwargs in aync_sglang_server.py by @EduardDurech in #4617
[reward] feat: use load_extern_object in get_custom_reward_fn, supporting pkg path by @tongyx361 in #4615
[vllm] fix: correctly pass params to from_lora_tensors in vLLM 0.12.0 by @HollowMan6 in #4614
[reward,doc] feat: enrich the reward loop documentation by @yyDing1 in #4619
[megatron] fix: fix MLA with sequence packing + CP by @wuweiqiang24 in #4611
[megatron, doc] refactor: update the megatron doc by @ISEEKYAN in #4630
[reward] feat: add retry to the request post method in the reward loop by @yyDing1 in #4628
[vllm] fix: LoRAModel import path change for vLLM 0.13.0 by @HollowMan6 in #4631
[misc] refactor: refactor flops counter by @vermouth1992 in #4633
[misc] feat: add importlib option to import external reward loop module by @PeterSH6 in #4635
[rollout] feat: ensure max_new_tokens is set correctly in sampling_params by @yanyc428 in #4634
[recipe] feat: accelerate rollout via model-free speculative decoding by @He-Jingkai in #4535
[training_utils] feat: use TMA to load Tiles in linear_cross_entropy kernels by @CtfGo in #4576
[data] feat: Add multimodal dataset fliter for user-customized results by @Kite0011 in #4608
[vllm] feat: Support online quant for rollout with torchao by @jerryzh168 in #3084
[misc] feat: Update news section in README.md by @vermouth1992 in #4646
[algo] feat: add cispo by @xvlincaigou in #4508
[data] feat: TransferQueue - remove redundant data collect for both TQ and DataProto by @0oshowero0 in #4618
[recipe, perf] feat: add nsys profiler support for env worker by @chenchaoxu7575 in #4463
[worker] fix: Add profiler initialization for ActorRolloutRefWorker in engine_worker by @pqhgit in #4586
[recipe, megatron, fsdp] fix: checkpoint-engine fix trainer param offload in fully-async mode by @zpltys in #4655
[doc] feat: Add fine-grained profiling tutorial for FSDP and Megatron on Ascend by @mengchengTang in #4610
[misc] feat: .git-blame-ignore-revs for large but non-informative commits by @tongyx361 in #4661
[doc] feat: Add OpenTinker to awesome work list by @zhusq20 in #4669
[fsdp] feat: Support zero2 optional feature for FSDP1 by @ZLiao097 in #4659
[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario by @PeterSH6 in #4668
[data] feat: TransferQueue - Support sync TransferQueue client & optimize clear interface and validation procedure by @0oshowero0 in #4660
[misc] fix: .git-blame-ignore-revs file is invalid by @HollowMan6 in #4674
[training_utils] fix: no allocator set when using TMA for kernels by @HollowMan6 in #4676
[fsdp] fix: replicate ref compute_log_prob (disable calculate_entropy ...) in LoRA by @HollowMan6 in #4675
[algo] SAPO algo by Qwen by @BounharAbdelaziz in #4345
[megatron] fix: megatron async save ckpt fix by @Leem-Li in #4638
[ci] fix: fix config by @vermouth1992 in #4685
Revert "[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario" by @vermouth1992 in #4687
[trainer, fsdp, megatron] feat: support one_step_off_policy on Ascend NPU by @baymax591 in #4686
[ci] test: add one step off policy test cases for npu by @ji-huazhong in #4485
fix(lora): use TOKEN_CLS task type for Critic model by @yurekami in #4695
fix: correct enable_activation_offload config parameter name by @yurekami in #4692
[misc] fix: deprecate rollout.mode config option by @yurekami in #4690
[ci] feat: Set Megatron related environment variable with ENV in Ascend dockerfile by @FightingZhen in #4699
[docker] feat: update stable image to vllm==0.12.0, sglang==0.5.6 by @Begunner in #4653
[rollout] fix: use configured response_length as default max_tokens in vLLM async server by @yurekami in #4703
[megatron] fix: set model to eval during compute_log_prob/compute_values by @HollowMan6 in #4708
[trainer] fix: fallback vision tower to flash_attention_2 for Qwen2.5-VL when u… by @aoshen524 in #4670
[docker] fix: new images for sgl056 and vllm012 have compatibility issues by @Begunner in #4714
[docs] feat: improve docstrings in tensordict_utils.py (#1345) by @yurekami in #4732
[rollout,docs] fix: improve error message (#4682) and docstrings (#1345) by @yurekami in #4729
docs: fix typos in code comments and messages by @yurekami in #4724
[training_utils] fix: RM extra scaling in KL/PG losses by @JacobHelwig in #4711
[deployment] feat: support build docker image with aarch64 platform by @rainj-me in #4605
[megatron] fix: Bump Megatron-Bridge commit for PEFT recompute by @HollowMan6 in #4702
[docs] feat: improve docstrings in seqlen_balancing.py (#1345) by @yurekami in #4731
[doc] feat: improve docstrings in torch_functional.py (#1345) by @yurekami in #4730
[reward] fix: make RateLimitedRewardManager accept legacy kwargs by @JoyboyBrian in #4739
[perf] feat: support profiler in model engine and sft trainer by @vermouth1992 in #4749
[ci] test: move cpu tests to volcengine machines by @Begunner in #4738
[trainer,megatron] fix: super tiny fix the issue of repeatedly importing the mindspeed patch by @ji-huazhong in #4751
[perf]feat: GPT-OSS mfu compute support by @mikequan0425 in #4750
[tool] fix: attach to existing MLflow run when MLFLOW_RUN_ID is set by @dubin555 in #4740
[trainer] fix: use dp_size instead of world_size in _balance_batch by @yurekami in #4697
[rollout] feat: Add vllm logprob mode and default processed_logprob by @RobotGF in #4755
[ci] fix: fix precommit by @vermouth1992 in #4760
[ci] test: migrate sft test cases on npu to model engine implementation by @ji-huazhong in #4762
[doc, cfg] fix: correct typos in training and docker configurations by @Racktic in #4767
[vllm] fix: use packaging.version for correct semantic version comparison by @Racktic in #4768
Revert "[algo] fix: Add seq mean mask denominator option" by @wuxibin89 in #4769
[data] feat: major refactor RLHFDataset for multi-modal data by @wuxibin89 in #4759
[perf] feat: add remote reward manager and fix math verify issue by @yyDing1 in #4752
[trainer] feat: enable ray-based sft trainer on ascend npu by @ji-huazhong in #4764
[training_utils] fix: Nested tensor micro-batching by @JacobHelwig in #4776
[worker] fix: Config for PPO batch size by @JacobHelwig in #4773
[ci] fix: fix cpu unit test by @vermouth1992 in #4774
[cfg] chore: remove redundant fields and fix typo by @JoyboyBrian in #4754
[worker] fix: Model engine parameter offload by @JacobHelwig in #4777
[fsdp] feat: integrate TiledMLP for memory-efficient MLP computation by @kevssim in #4649
[doc] chore: Update ascend_quick_start.rst by @wucong25 in #4609
[sglang, vllm, rollout] fix: use model's max_position_embeddings for max_model_len by @PeterSH6 in #4779
[doc] fix: reward_loop enable flag name by @zhuangqh in #4788
[doc] feat: add v0.7 release blog by @wuxibin89 in #4796

New Contributors

@gongyisheng made their first contribution in #4125
@Annarine made their first contribution in #4130
@HelloWorldBeginner made their first contribution in #4153
@wlhgtc made their first contribution in #4156
@sl-1314 made their first contribution in #4067
@kerrickstaley made their first contribution in #4171
@JoyboyBrian made their first contribution in #4107
@shevateng0 made their first contribution in #4139
@ashvinnihalani made their first contribution in #4091
@johnjunjun7 made their first contribution in #3427
@zjchenn made their first contribution in #4184
@HzZHoO made their first contribution in #4183
@EricMarcus-ai made their first contribution in #4185
@Shiguang-Guo made their first contribution in #4187
@Agoniii made their first contribution in #3519
@jQizhang made their first contribution in #4222
@JobQiu made their first contribution in #4248
@momo609 made their first contribution in #4166
@Kite0011 made their first contribution in #4250
@LLLLxmmm made their first contribution in #4175
@jprellberg made their first contribution in #4196
@chengminhua made their first contribution in #4209
@Leem-Li made their first contribution in #4253
@nuerxiati made their first contribution in #4165
@litianjian made their first contribution in #4101
@appletea233 made their first contribution in #4410
@jsfanfanfan made their first contribution in #4408
@icerain-alt made their first contribution in #4406
@Lokiscripter made their first contribution in #4398
@HanlinDu made their first contribution in #4433
@5082459 made their first contribution in #4432
@DaizeDong made their first contribution in #4396
@ryxli made their first contribution in #4423
@study8677 made their first contribution in #4486
@pengyanai made their first contribution in #4402
@hellcatCS made their first contribution in #4505
@yqsstudy made their first contribution in #4536
@zpltys made their first contribution in #4427
@zzhbrr made their first contribution in #4574
@wuweiqiang24 made their first contribution in #4611
@yanyc428 made their first contribution in #4634
@He-Jingkai made their first contribution in #4535
@CtfGo made their first contribution in #4576
@jerryzh168 made their first contribution in #3084
@xvlincaigou made their first contribution in #4508
@chenchaoxu7575 made their first contribution in #4463
@pqhgit made their first contribution in #4586
@mengchengTang made their first contribution in #4610
@zhusq20 made their first contribution in #4669
@BounharAbdelaziz made their first contribution in #4345
@yurekami made their first contribution in #4695
@Begunner made their first contribution in #4653
@JacobHelwig made their first contribution in #4711
@rainj-me made their first contribution in #4605
@mikequan0425 made their first contribution in #4750
@dubin555 made their first contribution in #4740
@RobotGF made their first contribution in #4755
@Racktic made their first contribution in #4767
@wucong25 made their first contribution in #4609
@zhuangqh made their first contribution in #4788

Full Changelog: v0.6.1...v0.7.0

verl-project/verl v0.7.0 on GitHub

v0.7 release

Highlight

What's Changed

New Contributors

verl-project/verl v0.7.0
on GitHub