github verl-project/verl v0.7.0

24 days ago

v0.7 release

Blog post: verl 0.7 release blog

Highlight

Model Engine

  • Integrate Megatron-Bridge and support LoRA/PEFT, see blog post: How We Build Trillion Parameter Reasoning RL with 10% GPUs
  • Support experimental fp8 training for megatron backend
  • Support new model for megatron backend: GPT-OSS, Qwen3-Next
  • Comprehensive support for new mode engine, FSDP and Megatron engine are production ready.
    • Dispatch tensordict with nested tensor instead of padded DataProto
    • Add TrainingWorker that resembles Tinker-like API
    • Add VLM support for model engine, SFT and RL trainer
    • Add model engine based critic model
    • Implement ActorRolloutRefWorker by TrainingWorker, support different backend in one worker
  • New VeOmni engine added, still in alpha status.

Rollout Engine

  • Remove SPMD rollout mode
  • Support blockwise fp8 rollout for vllm and sglang; support online quant for vllm with torchao
  • Experimental router replay support for vllm
  • Optimize multi-modal data fetch and preprocess, support video input
  • Upgrade to vllm==0.12.0; sglang==0.5.6

Reward

  • Support hybrid reward scenarios, including generative, discriminative, rule-based rewards, and their combinations.
  • Refactor reward models into server mode, supporting both colocated and standalone deployments.
  • Introduce new reward managers to handle more complex scenarios, limited mode for request rate control and remote mode for CPU-intensive tasks.

Algorithm

  • Add CISPO: Clipped IS-weight Policy Optimization
  • Add SAPO: Soft Adaptive Policy Optimization

Recipe

  • [NEW] VLA: add experimental support for VLA model
  • [NEW] rhymerl: History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
  • TransferQueue: support multiple data partition and optimize tensor zero-copy serialization
  • One-step-off-policy/Fully async: optimize weight synchronization by checkpoint engine with bucket and pipeline support.

What's Changed

  • [data] fix: MultiturnSFTDataset handle messages with list args in tool call by @gongyisheng in #4125
  • [ci, doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version by @FightingZhen in #4123
  • [data] fix: fix global_seqlen metric by @conver334 in #4129
  • [ci] fix: Optimize ascend docker build workflow and dockerfile to solve OOM problem by @FightingZhen in #4137
  • [ci] fix: fix error limiting MindSpeed cloning depth to one by @FightingZhen in #4140
  • [ci] feat: specify torch and torch_npu version into ascend dockerfile by @FightingZhen in #4141
  • [ci] fix: move torch and torch_npu install order in ascend dockerfile to ensure installed version correct by @FightingZhen in #4142
  • [ci] fix: Correct version relationship between torch and torchvision in ascend dockerfile by @FightingZhen in #4143
  • [doc] chore: Add one_step_off_policy support doc of Ascend NPU by @baymax591 in #4151
  • [rollout] fix: resource pool name in standalone mode by @PeterSH6 in #4149
  • [ci] feat: Update e2e_ascend CI image to 8.3.RC1 version, remove weekly validation workflow by @FightingZhen in #4146
  • [doc] chore: add pytorch conference materials by @hongpeng-guo in #4161
  • [rollout] fixup load_format=dummy update_weights not do process_weight… by @Annarine in #4130
  • [vllm] fix: Change parameter validation to align with vllm validation by @HelloWorldBeginner in #4153
  • [trainer] fix: reproducible problem when resume training by @wlhgtc in #4156
  • [recipe, tool] feat: support multi-turn and tool call for recipe/fully_async_policy by @sl-1314 in #4067
  • [cfg] fix: add rollout_correcton config field with omegaconf.open_dict by @tongyx361 in #4167
  • [doc] fix: Misc doc fixes by @kerrickstaley in #4171
  • [recipe] feat: add qwen3 8b grpo one_step_off_policy script on ASCEND NPU by @baymax591 in #4163
  • [BREAKING][rollout] feat: change rollout to server mode by default by @wuxibin89 in #4106
  • [algo] feat: Add RateLimitedRewardLoopManager with three-layer rate limiting for API-based rewards by @JoyboyBrian in #4107
  • [megatron] feat: load dist checkpoint with customized prefix for state dict keys. by @shevateng0 in #4139
  • [megatron] fix: Use tokenizer path or model path in config by @ashvinnihalani in #4091
  • [doc] chore: update docker installation guide by @wuxibin89 in #4155
  • [recipe] feat: DeepSeek-R1-Zero on Ascend NPU by @johnjunjun7 in #3427
  • [recipe] fix: compatibility with vLLM Qwen3Next model by @zjchenn in #4184
  • [recipe] fix: readme in recipe/r1_ascend by @HzZHoO in #4183
  • [recipe] fix: ReactAgentLoop error handling for failed LangGraph invocations by @le-czs in #4182
  • [ci] chore: Update e2e_ascend CI trigger policy by @FightingZhen in #4189
  • [recipe] fix: Qwen3-vl npu patch by @leisuzz in #4186
  • [rollout, doc] feat: limit tracing samples by @EricMarcus-ai in #4185
  • [worker, sglang] fix: Rename the file sglang_router.py to avoid circular imports by @Shiguang-Guo in #4187
  • [megatron, recipe] fix: error of megatron init while detached actor and rollout by @lalala-2 in #4179
  • [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI by @wlf-darkmatter in #3465
  • [rollout, vllm] feat: support blockwise fp8 rollout by @Agoniii in #3519
  • [ci] feat: Move hf_transfer dependency to requirement file by @FightingZhen in #4210
  • [misc] feat: init random model supports custom code in the model by @HollowMan6 in #4217
  • [single_controller] feat: support dispatch tensordict by @vermouth1992 in #4213
  • [recipe, doc, ckpt] fix: error of ckpt in fully async by @lalala-2 in #4199
  • [megatron] feat: FP8 training by @ISEEKYAN in #4223
  • [megatron] feat: moe fp16 training by @HaochenYuan in #4158
  • [recipe] fix: incorrect reward function in fapo scripts by @yyDing1 in #4195
  • [rollout, vllm] feat: support blockwise FP8 rollout for vLLM v0.11 MoE RL by @jQizhang in #4222
  • [single_controller] feat: support multiple replicate worker in one resource pool by @yyDing1 in #4226
  • [megatron] fix: BF16 mode should use PAO as well by @ashvinnihalani in #4221
  • Revert "[megatron] fix: BF16 mode should use PAO as well" by @ISEEKYAN in #4234
  • [doc] feat: Add Search Self-Play to awesome work list by @Necolizer in #4245
  • [worker] feat: add support for colocate replicas by @yyDing1 in #4233
  • [trainer] feat: refactor workers with model engine by @wuxibin89 in #4211
  • [single_controller] feat: support resource pool split method by @yyDing1 in #4251
  • [recipe] fix: tighten async rollouter task handling by @le-czs in #4230
  • Revert "[single_controller] feat: support resource pool split method" by @vermouth1992 in #4258
  • Revert "[worker] feat: add support for colocate replicas" by @wuxibin89 in #4259
  • Revert "[single_controller] feat: support multiple replicate worker in one resource pool" by @vermouth1992 in #4260
  • [ci] fix: Fix triton-ascend unavailable error in Ascend dockerfile by @FightingZhen in #4254
  • [ci] fix: Fix error in ascend dockerfile by @FightingZhen in #4265
  • [rollout] fix: ensure weight sync regardless of free_cache_engine by @JobQiu in #4248
  • [doc] feat: add rollout&train consistency doc for Ascend Platform by @momo609 in #4166
  • [recipe] feat: allow customize agent name by @vermouth1992 in #4269
  • [ci] fix: Remove redundant uninstall command in e2e_ascend by @FightingZhen in #4267
  • [megatron] Fix: fix bugs in mcore backend context-parallel code logic by @Kite0011 in #4250
  • [recipe] feat: add Experimental VLA RL Support by @The-Hierophant in #3918
  • [recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller by @LLLLxmmm in #4175
  • [ci] feat: Increase e2e_sft timeout from 25 to 30 minutes by @vermouth1992 in #4279
  • [megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT by @HollowMan6 in #4063
  • [single_controller] feat: support resource_pool split by @yyDing1 in #4273
  • [recipe] feat: move recipes to new repository verl-recipe by @wuxibin89 in #4283
  • [worker] feat: restore colocate workers based on new splited resource pool by @yyDing1 in #4282
  • [misc] feat: Add actor_rollout_ref.actor.calculate_entropy for entropy fwd by @EduardDurech in #4239
  • [trainer] feat: Self-Normalized Importance Sampling by @EduardDurech in #3980
  • [ci, megatron] fix: add rotary_pos_cos_sin to forward by @HollowMan6 in #4291
  • [megatron] fix: pass trust_remote_code to get_generation_config by @jprellberg in #4196
  • [misc] fix: support nested datastructure in dataproto to convert to tensordict by @PeterSH6 in #4296
  • [ci] fix: use local hf model path by @wuxibin89 in #4299
  • [data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix by @0oshowero0 in #4289
  • [recipe] feat: support reward_loop for recipe/fully_async_policy by @sl-1314 in #4224
  • [misc] fix: fix list conversion in get_tensordict by @PeterSH6 in #4304
  • [hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. by @ji-huazhong in #4309
  • [rollout] fix: some compatibility changes in agent loop and reward by @pengwu22 in #4293
  • [worker] fix: do not pass router address and tokenizer is their value is none by @yyDing1 in #4310
  • [doc] chore: Update ascend quickstart doc by @FightingZhen in #4321
  • [misc] feat: add more utils of tensordict by @vermouth1992 in #4322
  • [recipe] fix: Fixed scripts for one_step_off_policy async not implemention by @baymax591 in #4350
  • [model] feat: refactor engine folder structure by @vermouth1992 in #4352
  • [recipe] feat: move char count recipe to verl-recipe by @vermouth1992 in #4351
  • [ci] chore: switch ascend ci calculation resource by @FightingZhen in #4347
  • feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode by @szrlee in #4360
  • [misc] refactor: clean up unused sharding managers by @ji-huazhong in #4361
  • [worker] feat: Add TrainingWorker that resembles Tinker-like API by @vermouth1992 in #4371
  • [vllm] fix: Fix issues that occur during the ACLGraph initialization process in the NPU. by @chengminhua in #4209
  • [megatron] feat: support gpt-oss by @ISEEKYAN in #4323
  • [megatron] fix: megatron async save ckpt fix by @Leem-Li in #4253
  • [misc] feat: Update news section in README.md by @vermouth1992 in #4385
  • [misc] fix: handle empty TensorDict in DataProto serialization by @le-czs in #4379
  • [trainer,fsdp] feat: enable reproducibility for training by @ji-huazhong in #4378
  • [trainer] feat: support ray-based sft trainer by @vermouth1992 in #4382
  • [megatron] feat: optimize the mbridge checkpoint saving speed by @ISEEKYAN in #4386
  • [rollout] feat: add support for discriminative reward model in reward loop by @yyDing1 in #4358
  • [recipe] feat: refactor one step off to support server mode by @ArronHZG in #4307
  • [misc] feat: support TensorDict in DataProtoFuture by @vermouth1992 in #4395
  • [fsdp] fix: Fixing the error caused by empty tensors in the multi_turn + remove_padding scenario by @nuerxiati in #4165
  • [doc] fix: add Geo-RS-Seq-TIS estimators and update documentation by @szrlee in #4359
  • [worker] feat: custom master addr port by @tongyx361 in #4389
  • [doc] feat: update reward loop document by @yyDing1 in #4404
  • [algo] feat: support router replay by @litianjian in #4101
  • [recipe] fix: FlowRL actor to pure implementation by @Xuekai-Zhu in #4397
  • [doc] feat: add more user instructions to reward loop doc by @yyDing1 in #4409
  • [doc] feat: add OneThinker link in readme by @appletea233 in #4410
  • [ci] fix: NPU not support router replay by @wuxibin89 in #4414
  • [worker] feat: custom reward_manager by @tongyx361 in #4387
  • [vllm] feat: retires vllm spmd mode in the codebase by @PeterSH6 in #4411
  • [sglang] fix: HTTP server startup issues for Prometheus and Grafana integration by @jsfanfanfan in #4408
  • [doc] chore: Update ascend quickstart and docker build guidance doc by @FightingZhen in #4420
  • [sglang] feat: retires sglang spmd mode in the codebase by @PeterSH6 in #4422
  • [fsdp] feat: update NPU fused kernels for Qwen3 moe block by @icerain-alt in #4406
  • [misc] refactor: clean up unused sharding manager by @ji-huazhong in #4439
  • [hardware] chore: clean npu_patch by @FightingZhen in #4436
  • [misc] fix: fix memory leakage when initializing multiple tools by @PeterSH6 in #4430
  • [trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe by @moehanabi in #3975
  • [misc] feat: optimize performance of index_select_tensor_dict by @vermouth1992 in #4444
  • [ci] test: Disable ReMax training test in vllm workflow by @PeterSH6 in #4445
  • [rollout] fix: RolloutConfig should support repetition_penalty config… by @Lokiscripter in #4398
  • [recipe] feat: add fully async comm between rollout and sim node in disagg mode by @HanlinDu in #4433
  • [misc] feat: optimize nested tensor index by @vermouth1992 in #4447
  • [model] feat: add qwen3-4b grpo script on ASCEND NPU A3 by @5082459 in #4432
  • [megatron] fix: Remove Deprecated Megatron Optimizer Args by @DaizeDong in #4396
  • [megatron] fix: respect use_distributed_optimizer in config by @HollowMan6 in #4392
  • [recipe, ci] fix: remove batch mode for remote generative reward model by @yyDing1 in #4448
  • [misc] feat: optimize rearrange_micro_batches by @vermouth1992 in #4451
  • [rollout, sglang] feat: support blockwise fp8 rollout by @Agoniii in #4415
  • [trainer] feat: model engine sft trainer support vlm model by @wuxibin89 in #4403
  • [trainer] feat: add reward loop config to default config by @yyDing1 in #4452
  • [vllm] feat: support abort generating requests in vllm server by @PeterSH6 in #4453
  • [ci] chore: cleanup some ci workflow by @wuxibin89 in #4459
  • [trainer] feat: allow override for reward_manager_worker in agent loop by @ryxli in #4423
  • [model] feat: enhances TrainingWorker by @vermouth1992 in #4461
  • [recipe] feat: Modify the way of obtaining default_runtime_env by @xichengpro in #4468
  • [rollout] fix: mlflow consecutive slashes by @BaiqingL in #4446
  • [fsdp] fix: reward model also reads override config attn_implementation by @pengwu22 in #4458
  • [vllm] fix: compatible to vllm0.12 by @ISEEKYAN in #4473
  • [model] feat: support manual control load/offload by @vermouth1992 in #4472
  • [ci] feat: Update e2e_ascend to improve CI execution efficiency by @FightingZhen in #4477
  • [ci] fix: Fix e2e_ascend sft test case error by @FightingZhen in #4481
  • [trainer] feat: support moving ppo actor logics to single controller by @vermouth1992 in #4480
  • [megatron] fix: correct typo in modeling_qwen2_megatron.py by @study8677 in #4486
  • [fsdp] fix: qwen3vlmoe with Monkey patch to fix a bug in transformers 4.57.x by @pengyanai in #4402
  • [ci] fix: fix format check error by @ji-huazhong in #4506
  • [hardware] feat: Auto set device_name to npu for Ascend NPU by @FightingZhen in #4489
  • [trainer] feat: make reward loop disrm default by @yyDing1 in #4466
  • [algo,doc] refactor: rollout correction by @szrlee in #4511
  • [trainer] feat: enable model engine based critic by @vermouth1992 in #4507
  • [vllm, rollout] feat: support reset prefix cache after abort by @PeterSH6 in #4519
  • [ci] chore: remove proxy settings in e2e_ascend by @FightingZhen in #4527
  • [rollout] fix: correct heap-based load balancing in AsyncLLMServerManager by @hellcatCS in #4505
  • [sglang, rollout] feat: delete remaining sglang spmd code by @PeterSH6 in #4523
  • [data] feat: TransferQueue - Add zero-copy serialization support & usage improvement by @0oshowero0 in #4429
  • [rollout] feat: pass agent_data to tool calling by @wuxibin89 in #4469
  • [megatron,ci] chore: update instructions and scripts for LoRA by @HollowMan6 in #4533
  • [megatron] chore: clean legacy code path part 1, make engine use mbridge by default by @ISEEKYAN in #4528
  • [megatron] chore: clean legacy code path part 2, clean legacy CI by @ISEEKYAN in #4529
  • [trainer] fix: model engine vlm multi_modal_inputs to NonTensorStack by @wuxibin89 in #4492
  • [ray] chore: Update Ray version dependency in requirements-npu.txt by @FightingZhen in #4543
  • [ci] chore: migrate all rm related ci to reward loop by @yyDing1 in #4520
  • [algo] fix: Add seq mean mask denominator option by @szrlee in #4510
  • [trainer] fix: change name for reward loop worker override by @ryxli in #4549
  • [rollout,vllm] feat: disable sleep mode in fully-async mode by @chenjiaoAngel in #4521
  • [rollout, trainer] feat: extend agent loop for custom implementations by @JoyboyBrian in #4548
  • [rollout] chore: update reward loop file names by @yyDing1 in #4547
  • [ci] fix: Add mbridge dependency into e2e_ascend by @FightingZhen in #4560
  • [doc] feat: add JupyterLab plugin instructions by @yqsstudy in #4536
  • [ci] feat: Increase e2e_sft timeout from 30 to 40 minutes by @vermouth1992 in #4552
  • [misc] chore: add "reward" tag to PR template by @yyDing1 in #4573
  • [BREAKING][recipe, ckpt] feat: support parameter sync by checkpoint-engine. only for fully_async mode. by @zpltys in #4427
  • [training_utils] fix: fix model enum acquire logic error in registry by @FightingZhen in #4577
  • [megatron] feat: add script for qwen3next training by @ISEEKYAN in #4582
  • [ci] fix: exclude FSDP-related source files from Megatron CI by @zzhbrr in #4574
  • [reward,ci] fix: cast by @tongyx361 in #4594
  • [vllm] feat: TensorLoRARequest support newer vLLM versions by @HollowMan6 in #4606
  • [misc] feat: always use robust get_event_loop by @tongyx361 in #4603
  • [trainer] feat: Implemented VeomniEngine as a alternative training backend by @A1waysBeenHere in #4072
  • [perf] fix: modify the NPU profiler default configuration by @tardis-key in #4475
  • [megatron] feat: support discrete profiling for mindspeed by @tardis-key in #4271
  • [doc] chore: update LoRA docs with megatron guidelines by @HollowMan6 in #4565
  • [reward] feat: Optimize reward computation when use_reward_loop=True by @none0663 in #4581
  • [rollout] chore: rename reward loop class name and update ci by @yyDing1 in #4572
  • [log] fix: fix wandb log validate run error on async-tool by @chenjiaoAngel in #4591
  • [sglang] fix: warmup_thread_args->warmup_thread_kwargs in aync_sglang_server.py by @EduardDurech in #4617
  • [reward] feat: use load_extern_object in get_custom_reward_fn, supporting pkg path by @tongyx361 in #4615
  • [vllm] fix: correctly pass params to from_lora_tensors in vLLM 0.12.0 by @HollowMan6 in #4614
  • [reward,doc] feat: enrich the reward loop documentation by @yyDing1 in #4619
  • [megatron] fix: fix MLA with sequence packing + CP by @wuweiqiang24 in #4611
  • [megatron, doc] refactor: update the megatron doc by @ISEEKYAN in #4630
  • [reward] feat: add retry to the request post method in the reward loop by @yyDing1 in #4628
  • [vllm] fix: LoRAModel import path change for vLLM 0.13.0 by @HollowMan6 in #4631
  • [misc] refactor: refactor flops counter by @vermouth1992 in #4633
  • [misc] feat: add importlib option to import external reward loop module by @PeterSH6 in #4635
  • [rollout] feat: ensure max_new_tokens is set correctly in sampling_params by @yanyc428 in #4634
  • [recipe] feat: accelerate rollout via model-free speculative decoding by @He-Jingkai in #4535
  • [training_utils] feat: use TMA to load Tiles in linear_cross_entropy kernels by @CtfGo in #4576
  • [data] feat: Add multimodal dataset fliter for user-customized results by @Kite0011 in #4608
  • [vllm] feat: Support online quant for rollout with torchao by @jerryzh168 in #3084
  • [misc] feat: Update news section in README.md by @vermouth1992 in #4646
  • [algo] feat: add cispo by @xvlincaigou in #4508
  • [data] feat: TransferQueue - remove redundant data collect for both TQ and DataProto by @0oshowero0 in #4618
  • [recipe, perf] feat: add nsys profiler support for env worker by @chenchaoxu7575 in #4463
  • [worker] fix: Add profiler initialization for ActorRolloutRefWorker in engine_worker by @pqhgit in #4586
  • [recipe, megatron, fsdp] fix: checkpoint-engine fix trainer param offload in fully-async mode by @zpltys in #4655
  • [doc] feat: Add fine-grained profiling tutorial for FSDP and Megatron on Ascend by @mengchengTang in #4610
  • [misc] feat: .git-blame-ignore-revs for large but non-informative commits by @tongyx361 in #4661
  • [doc] feat: Add OpenTinker to awesome work list by @zhusq20 in #4669
  • [fsdp] feat: Support zero2 optional feature for FSDP1 by @ZLiao097 in #4659
  • [rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario by @PeterSH6 in #4668
  • [data] feat: TransferQueue - Support sync TransferQueue client & optimize clear interface and validation procedure by @0oshowero0 in #4660
  • [misc] fix: .git-blame-ignore-revs file is invalid by @HollowMan6 in #4674
  • [training_utils] fix: no allocator set when using TMA for kernels by @HollowMan6 in #4676
  • [fsdp] fix: replicate ref compute_log_prob (disable calculate_entropy ...) in LoRA by @HollowMan6 in #4675
  • [algo] SAPO algo by Qwen by @BounharAbdelaziz in #4345
  • [megatron] fix: megatron async save ckpt fix by @Leem-Li in #4638
  • [ci] fix: fix config by @vermouth1992 in #4685
  • Revert "[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario" by @vermouth1992 in #4687
  • [trainer, fsdp, megatron] feat: support one_step_off_policy on Ascend NPU by @baymax591 in #4686
  • [ci] test: add one step off policy test cases for npu by @ji-huazhong in #4485
  • fix(lora): use TOKEN_CLS task type for Critic model by @yurekami in #4695
  • fix: correct enable_activation_offload config parameter name by @yurekami in #4692
  • [misc] fix: deprecate rollout.mode config option by @yurekami in #4690
  • [ci] feat: Set Megatron related environment variable with ENV in Ascend dockerfile by @FightingZhen in #4699
  • [docker] feat: update stable image to vllm==0.12.0, sglang==0.5.6 by @Begunner in #4653
  • [rollout] fix: use configured response_length as default max_tokens in vLLM async server by @yurekami in #4703
  • [megatron] fix: set model to eval during compute_log_prob/compute_values by @HollowMan6 in #4708
  • [trainer] fix: fallback vision tower to flash_attention_2 for Qwen2.5-VL when u… by @aoshen524 in #4670
  • [docker] fix: new images for sgl056 and vllm012 have compatibility issues by @Begunner in #4714
  • [docs] feat: improve docstrings in tensordict_utils.py (#1345) by @yurekami in #4732
  • [rollout,docs] fix: improve error message (#4682) and docstrings (#1345) by @yurekami in #4729
  • docs: fix typos in code comments and messages by @yurekami in #4724
  • [training_utils] fix: RM extra scaling in KL/PG losses by @JacobHelwig in #4711
  • [deployment] feat: support build docker image with aarch64 platform by @rainj-me in #4605
  • [megatron] fix: Bump Megatron-Bridge commit for PEFT recompute by @HollowMan6 in #4702
  • [docs] feat: improve docstrings in seqlen_balancing.py (#1345) by @yurekami in #4731
  • [doc] feat: improve docstrings in torch_functional.py (#1345) by @yurekami in #4730
  • [reward] fix: make RateLimitedRewardManager accept legacy kwargs by @JoyboyBrian in #4739
  • [perf] feat: support profiler in model engine and sft trainer by @vermouth1992 in #4749
  • [ci] test: move cpu tests to volcengine machines by @Begunner in #4738
  • [trainer,megatron] fix: super tiny fix the issue of repeatedly importing the mindspeed patch by @ji-huazhong in #4751
  • [perf]feat: GPT-OSS mfu compute support by @mikequan0425 in #4750
  • [tool] fix: attach to existing MLflow run when MLFLOW_RUN_ID is set by @dubin555 in #4740
  • [trainer] fix: use dp_size instead of world_size in _balance_batch by @yurekami in #4697
  • [rollout] feat: Add vllm logprob mode and default processed_logprob by @RobotGF in #4755
  • [ci] fix: fix precommit by @vermouth1992 in #4760
  • [ci] test: migrate sft test cases on npu to model engine implementation by @ji-huazhong in #4762
  • [doc, cfg] fix: correct typos in training and docker configurations by @Racktic in #4767
  • [vllm] fix: use packaging.version for correct semantic version comparison by @Racktic in #4768
  • Revert "[algo] fix: Add seq mean mask denominator option" by @wuxibin89 in #4769
  • [data] feat: major refactor RLHFDataset for multi-modal data by @wuxibin89 in #4759
  • [perf] feat: add remote reward manager and fix math verify issue by @yyDing1 in #4752
  • [trainer] feat: enable ray-based sft trainer on ascend npu by @ji-huazhong in #4764
  • [training_utils] fix: Nested tensor micro-batching by @JacobHelwig in #4776
  • [worker] fix: Config for PPO batch size by @JacobHelwig in #4773
  • [ci] fix: fix cpu unit test by @vermouth1992 in #4774
  • [cfg] chore: remove redundant fields and fix typo by @JoyboyBrian in #4754
  • [worker] fix: Model engine parameter offload by @JacobHelwig in #4777
  • [fsdp] feat: integrate TiledMLP for memory-efficient MLP computation by @kevssim in #4649
  • [doc] chore: Update ascend_quick_start.rst by @wucong25 in #4609
  • [sglang, vllm, rollout] fix: use model's max_position_embeddings for max_model_len by @PeterSH6 in #4779
  • [doc] fix: reward_loop enable flag name by @zhuangqh in #4788
  • [doc] feat: add v0.7 release blog by @wuxibin89 in #4796

New Contributors

  • @gongyisheng made their first contribution in #4125
  • @Annarine made their first contribution in #4130
  • @HelloWorldBeginner made their first contribution in #4153
  • @wlhgtc made their first contribution in #4156
  • @sl-1314 made their first contribution in #4067
  • @kerrickstaley made their first contribution in #4171
  • @JoyboyBrian made their first contribution in #4107
  • @shevateng0 made their first contribution in #4139
  • @ashvinnihalani made their first contribution in #4091
  • @johnjunjun7 made their first contribution in #3427
  • @zjchenn made their first contribution in #4184
  • @HzZHoO made their first contribution in #4183
  • @EricMarcus-ai made their first contribution in #4185
  • @Shiguang-Guo made their first contribution in #4187
  • @Agoniii made their first contribution in #3519
  • @jQizhang made their first contribution in #4222
  • @JobQiu made their first contribution in #4248
  • @momo609 made their first contribution in #4166
  • @Kite0011 made their first contribution in #4250
  • @LLLLxmmm made their first contribution in #4175
  • @jprellberg made their first contribution in #4196
  • @chengminhua made their first contribution in #4209
  • @Leem-Li made their first contribution in #4253
  • @nuerxiati made their first contribution in #4165
  • @litianjian made their first contribution in #4101
  • @appletea233 made their first contribution in #4410
  • @jsfanfanfan made their first contribution in #4408
  • @icerain-alt made their first contribution in #4406
  • @Lokiscripter made their first contribution in #4398
  • @HanlinDu made their first contribution in #4433
  • @5082459 made their first contribution in #4432
  • @DaizeDong made their first contribution in #4396
  • @ryxli made their first contribution in #4423
  • @study8677 made their first contribution in #4486
  • @pengyanai made their first contribution in #4402
  • @hellcatCS made their first contribution in #4505
  • @yqsstudy made their first contribution in #4536
  • @zpltys made their first contribution in #4427
  • @zzhbrr made their first contribution in #4574
  • @wuweiqiang24 made their first contribution in #4611
  • @yanyc428 made their first contribution in #4634
  • @He-Jingkai made their first contribution in #4535
  • @CtfGo made their first contribution in #4576
  • @jerryzh168 made their first contribution in #3084
  • @xvlincaigou made their first contribution in #4508
  • @chenchaoxu7575 made their first contribution in #4463
  • @pqhgit made their first contribution in #4586
  • @mengchengTang made their first contribution in #4610
  • @zhusq20 made their first contribution in #4669
  • @BounharAbdelaziz made their first contribution in #4345
  • @yurekami made their first contribution in #4695
  • @Begunner made their first contribution in #4653
  • @JacobHelwig made their first contribution in #4711
  • @rainj-me made their first contribution in #4605
  • @mikequan0425 made their first contribution in #4750
  • @dubin555 made their first contribution in #4740
  • @RobotGF made their first contribution in #4755
  • @Racktic made their first contribution in #4767
  • @wucong25 made their first contribution in #4609
  • @zhuangqh made their first contribution in #4788

Full Changelog: v0.6.1...v0.7.0

Don't miss a new verl release

NewReleases is sending notifications on new releases.