v0.7 release
Blog post: verl 0.7 release blog
Highlight
Model Engine
- Integrate Megatron-Bridge and support LoRA/PEFT, see blog post: How We Build Trillion Parameter Reasoning RL with 10% GPUs
- Support experimental fp8 training for megatron backend
- Support new model for megatron backend: GPT-OSS, Qwen3-Next
- Comprehensive support for new mode engine, FSDP and Megatron engine are production ready.
- Dispatch tensordict with nested tensor instead of padded DataProto
- Add TrainingWorker that resembles Tinker-like API
- Add VLM support for model engine, SFT and RL trainer
- Add model engine based critic model
- Implement ActorRolloutRefWorker by TrainingWorker, support different backend in one worker
- New VeOmni engine added, still in alpha status.
Rollout Engine
- Remove SPMD rollout mode
- Support blockwise fp8 rollout for vllm and sglang; support online quant for vllm with torchao
- Experimental router replay support for vllm
- Optimize multi-modal data fetch and preprocess, support video input
- Upgrade to vllm==0.12.0; sglang==0.5.6
Reward
- Support hybrid reward scenarios, including generative, discriminative, rule-based rewards, and their combinations.
- Refactor reward models into server mode, supporting both colocated and standalone deployments.
- Introduce new reward managers to handle more complex scenarios, limited mode for request rate control and remote mode for CPU-intensive tasks.
Algorithm
Recipe
- [NEW] VLA: add experimental support for VLA model
- [NEW] rhymerl: History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
- TransferQueue: support multiple data partition and optimize tensor zero-copy serialization
- One-step-off-policy/Fully async: optimize weight synchronization by checkpoint engine with bucket and pipeline support.
What's Changed
- [data] fix: MultiturnSFTDataset handle messages with list args in tool call by @gongyisheng in #4125
- [ci, doc] feat: Update Ascend Dockerfile and docker build workflow to 8.3.RC1 version by @FightingZhen in #4123
- [data] fix: fix global_seqlen metric by @conver334 in #4129
- [ci] fix: Optimize ascend docker build workflow and dockerfile to solve OOM problem by @FightingZhen in #4137
- [ci] fix: fix error limiting MindSpeed cloning depth to one by @FightingZhen in #4140
- [ci] feat: specify torch and torch_npu version into ascend dockerfile by @FightingZhen in #4141
- [ci] fix: move torch and torch_npu install order in ascend dockerfile to ensure installed version correct by @FightingZhen in #4142
- [ci] fix: Correct version relationship between torch and torchvision in ascend dockerfile by @FightingZhen in #4143
- [doc] chore: Add one_step_off_policy support doc of Ascend NPU by @baymax591 in #4151
- [rollout] fix: resource pool name in standalone mode by @PeterSH6 in #4149
- [ci] feat: Update e2e_ascend CI image to 8.3.RC1 version, remove weekly validation workflow by @FightingZhen in #4146
- [doc] chore: add pytorch conference materials by @hongpeng-guo in #4161
- [rollout] fixup load_format=dummy update_weights not do process_weight… by @Annarine in #4130
- [vllm] fix: Change parameter validation to align with vllm validation by @HelloWorldBeginner in #4153
- [trainer] fix: reproducible problem when resume training by @wlhgtc in #4156
- [recipe, tool] feat: support multi-turn and tool call for recipe/fully_async_policy by @sl-1314 in #4067
- [cfg] fix: add
rollout_correctonconfig field withomegaconf.open_dictby @tongyx361 in #4167 - [doc] fix: Misc doc fixes by @kerrickstaley in #4171
- [recipe] feat: add qwen3 8b grpo one_step_off_policy script on ASCEND NPU by @baymax591 in #4163
- [BREAKING][rollout] feat: change rollout to server mode by default by @wuxibin89 in #4106
- [algo] feat: Add RateLimitedRewardLoopManager with three-layer rate limiting for API-based rewards by @JoyboyBrian in #4107
- [megatron] feat: load dist checkpoint with customized prefix for state dict keys. by @shevateng0 in #4139
- [megatron] fix: Use tokenizer path or model path in config by @ashvinnihalani in #4091
- [doc] chore: update docker installation guide by @wuxibin89 in #4155
- [recipe] feat: DeepSeek-R1-Zero on Ascend NPU by @johnjunjun7 in #3427
- [recipe] fix: compatibility with vLLM Qwen3Next model by @zjchenn in #4184
- [recipe] fix: readme in recipe/r1_ascend by @HzZHoO in #4183
- [recipe] fix: ReactAgentLoop error handling for failed LangGraph invocations by @le-czs in #4182
- [ci] chore: Update e2e_ascend CI trigger policy by @FightingZhen in #4189
- [recipe] fix: Qwen3-vl npu patch by @leisuzz in #4186
- [rollout, doc] feat: limit tracing samples by @EricMarcus-ai in #4185
- [worker, sglang] fix: Rename the file sglang_router.py to avoid circular imports by @Shiguang-Guo in #4187
- [megatron, recipe] fix: error of megatron init while detached actor and rollout by @lalala-2 in #4179
- [ci, megatron] test: Add Qwen3 Megatron+Mindspeed Ascend NPU CI by @wlf-darkmatter in #3465
- [rollout, vllm] feat: support blockwise fp8 rollout by @Agoniii in #3519
- [ci] feat: Move hf_transfer dependency to requirement file by @FightingZhen in #4210
- [misc] feat: init random model supports custom code in the model by @HollowMan6 in #4217
- [single_controller] feat: support dispatch tensordict by @vermouth1992 in #4213
- [recipe, doc, ckpt] fix: error of ckpt in fully async by @lalala-2 in #4199
- [megatron] feat: FP8 training by @ISEEKYAN in #4223
- [megatron] feat: moe fp16 training by @HaochenYuan in #4158
- [recipe] fix: incorrect reward function in fapo scripts by @yyDing1 in #4195
- [rollout, vllm] feat: support blockwise FP8 rollout for vLLM v0.11 MoE RL by @jQizhang in #4222
- [single_controller] feat: support multiple replicate worker in one resource pool by @yyDing1 in #4226
- [megatron] fix: BF16 mode should use PAO as well by @ashvinnihalani in #4221
- Revert "[megatron] fix: BF16 mode should use PAO as well" by @ISEEKYAN in #4234
- [doc] feat: Add Search Self-Play to awesome work list by @Necolizer in #4245
- [worker] feat: add support for colocate replicas by @yyDing1 in #4233
- [trainer] feat: refactor workers with model engine by @wuxibin89 in #4211
- [single_controller] feat: support resource pool split method by @yyDing1 in #4251
- [recipe] fix: tighten async rollouter task handling by @le-czs in #4230
- Revert "[single_controller] feat: support resource pool split method" by @vermouth1992 in #4258
- Revert "[worker] feat: add support for colocate replicas" by @wuxibin89 in #4259
- Revert "[single_controller] feat: support multiple replicate worker in one resource pool" by @vermouth1992 in #4260
- [ci] fix: Fix triton-ascend unavailable error in Ascend dockerfile by @FightingZhen in #4254
- [ci] fix: Fix error in ascend dockerfile by @FightingZhen in #4265
- [rollout] fix: ensure weight sync regardless of free_cache_engine by @JobQiu in #4248
- [doc] feat: add rollout&train consistency doc for Ascend Platform by @momo609 in #4166
- [recipe] feat: allow customize agent name by @vermouth1992 in #4269
- [ci] fix: Remove redundant uninstall command in e2e_ascend by @FightingZhen in #4267
- [megatron] Fix: fix bugs in mcore backend context-parallel code logic by @Kite0011 in #4250
- [recipe] feat: add Experimental VLA RL Support by @The-Hierophant in #3918
- [recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller by @LLLLxmmm in #4175
- [ci] feat: Increase e2e_sft timeout from 25 to 30 minutes by @vermouth1992 in #4279
- [megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT by @HollowMan6 in #4063
- [single_controller] feat: support resource_pool split by @yyDing1 in #4273
- [recipe] feat: move recipes to new repository verl-recipe by @wuxibin89 in #4283
- [worker] feat: restore colocate workers based on new splited resource pool by @yyDing1 in #4282
- [misc] feat: Add
actor_rollout_ref.actor.calculate_entropyfor entropy fwd by @EduardDurech in #4239 - [trainer] feat: Self-Normalized Importance Sampling by @EduardDurech in #3980
- [ci, megatron] fix: add
rotary_pos_cos_sinto forward by @HollowMan6 in #4291 - [megatron] fix: pass trust_remote_code to get_generation_config by @jprellberg in #4196
- [misc] fix: support nested datastructure in dataproto to convert to tensordict by @PeterSH6 in #4296
- [ci] fix: use local hf model path by @wuxibin89 in #4299
- [data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix by @0oshowero0 in #4289
- [recipe] feat: support reward_loop for recipe/fully_async_policy by @sl-1314 in #4224
- [misc] fix: fix list conversion in get_tensordict by @PeterSH6 in #4304
- [hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. by @ji-huazhong in #4309
- [rollout] fix: some compatibility changes in agent loop and reward by @pengwu22 in #4293
- [worker] fix: do not pass router address and tokenizer is their value is none by @yyDing1 in #4310
- [doc] chore: Update ascend quickstart doc by @FightingZhen in #4321
- [misc] feat: add more utils of tensordict by @vermouth1992 in #4322
- [recipe] fix: Fixed scripts for one_step_off_policy async not implemention by @baymax591 in #4350
- [model] feat: refactor engine folder structure by @vermouth1992 in #4352
- [recipe] feat: move char count recipe to verl-recipe by @vermouth1992 in #4351
- [ci] chore: switch ascend ci calculation resource by @FightingZhen in #4347
- feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode by @szrlee in #4360
- [misc] refactor: clean up unused sharding managers by @ji-huazhong in #4361
- [worker] feat: Add TrainingWorker that resembles Tinker-like API by @vermouth1992 in #4371
- [vllm] fix: Fix issues that occur during the ACLGraph initialization process in the NPU. by @chengminhua in #4209
- [megatron] feat: support gpt-oss by @ISEEKYAN in #4323
- [megatron] fix: megatron async save ckpt fix by @Leem-Li in #4253
- [misc] feat: Update news section in README.md by @vermouth1992 in #4385
- [misc] fix: handle empty TensorDict in DataProto serialization by @le-czs in #4379
- [trainer,fsdp] feat: enable reproducibility for training by @ji-huazhong in #4378
- [trainer] feat: support ray-based sft trainer by @vermouth1992 in #4382
- [megatron] feat: optimize the mbridge checkpoint saving speed by @ISEEKYAN in #4386
- [rollout] feat: add support for discriminative reward model in reward loop by @yyDing1 in #4358
- [recipe] feat: refactor one step off to support server mode by @ArronHZG in #4307
- [misc] feat: support TensorDict in DataProtoFuture by @vermouth1992 in #4395
- [fsdp] fix: Fixing the error caused by empty tensors in the multi_turn + remove_padding scenario by @nuerxiati in #4165
- [doc] fix: add Geo-RS-Seq-TIS estimators and update documentation by @szrlee in #4359
- [worker] feat: custom master addr port by @tongyx361 in #4389
- [doc] feat: update reward loop document by @yyDing1 in #4404
- [algo] feat: support router replay by @litianjian in #4101
- [recipe] fix: FlowRL actor to pure implementation by @Xuekai-Zhu in #4397
- [doc] feat: add more user instructions to reward loop doc by @yyDing1 in #4409
- [doc] feat: add OneThinker link in readme by @appletea233 in #4410
- [ci] fix: NPU not support router replay by @wuxibin89 in #4414
- [worker] feat: custom reward_manager by @tongyx361 in #4387
- [vllm] feat: retires vllm spmd mode in the codebase by @PeterSH6 in #4411
- [sglang] fix: HTTP server startup issues for Prometheus and Grafana integration by @jsfanfanfan in #4408
- [doc] chore: Update ascend quickstart and docker build guidance doc by @FightingZhen in #4420
- [sglang] feat: retires sglang spmd mode in the codebase by @PeterSH6 in #4422
- [fsdp] feat: update NPU fused kernels for Qwen3 moe block by @icerain-alt in #4406
- [misc] refactor: clean up unused sharding manager by @ji-huazhong in #4439
- [hardware] chore: clean npu_patch by @FightingZhen in #4436
- [misc] fix: fix memory leakage when initializing multiple tools by @PeterSH6 in #4430
- [trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe by @moehanabi in #3975
- [misc] feat: optimize performance of index_select_tensor_dict by @vermouth1992 in #4444
- [ci] test: Disable ReMax training test in vllm workflow by @PeterSH6 in #4445
- [rollout] fix: RolloutConfig should support repetition_penalty config… by @Lokiscripter in #4398
- [recipe] feat: add fully async comm between rollout and sim node in disagg mode by @HanlinDu in #4433
- [misc] feat: optimize nested tensor index by @vermouth1992 in #4447
- [model] feat: add qwen3-4b grpo script on ASCEND NPU A3 by @5082459 in #4432
- [megatron] fix: Remove Deprecated Megatron Optimizer Args by @DaizeDong in #4396
- [megatron] fix: respect
use_distributed_optimizerin config by @HollowMan6 in #4392 - [recipe, ci] fix: remove batch mode for remote generative reward model by @yyDing1 in #4448
- [misc] feat: optimize rearrange_micro_batches by @vermouth1992 in #4451
- [rollout, sglang] feat: support blockwise fp8 rollout by @Agoniii in #4415
- [trainer] feat: model engine sft trainer support vlm model by @wuxibin89 in #4403
- [trainer] feat: add reward loop config to default config by @yyDing1 in #4452
- [vllm] feat: support abort generating requests in vllm server by @PeterSH6 in #4453
- [ci] chore: cleanup some ci workflow by @wuxibin89 in #4459
- [trainer] feat: allow override for reward_manager_worker in agent loop by @ryxli in #4423
- [model] feat: enhances TrainingWorker by @vermouth1992 in #4461
- [recipe] feat: Modify the way of obtaining default_runtime_env by @xichengpro in #4468
- [rollout] fix: mlflow consecutive slashes by @BaiqingL in #4446
- [fsdp] fix: reward model also reads override config attn_implementation by @pengwu22 in #4458
- [vllm] fix: compatible to vllm0.12 by @ISEEKYAN in #4473
- [model] feat: support manual control load/offload by @vermouth1992 in #4472
- [ci] feat: Update e2e_ascend to improve CI execution efficiency by @FightingZhen in #4477
- [ci] fix: Fix e2e_ascend sft test case error by @FightingZhen in #4481
- [trainer] feat: support moving ppo actor logics to single controller by @vermouth1992 in #4480
- [megatron] fix: correct typo in modeling_qwen2_megatron.py by @study8677 in #4486
- [fsdp] fix: qwen3vlmoe with Monkey patch to fix a bug in transformers 4.57.x by @pengyanai in #4402
- [ci] fix: fix format check error by @ji-huazhong in #4506
- [hardware] feat: Auto set device_name to npu for Ascend NPU by @FightingZhen in #4489
- [trainer] feat: make reward loop disrm default by @yyDing1 in #4466
- [algo,doc] refactor: rollout correction by @szrlee in #4511
- [trainer] feat: enable model engine based critic by @vermouth1992 in #4507
- [vllm, rollout] feat: support reset prefix cache after abort by @PeterSH6 in #4519
- [ci] chore: remove proxy settings in e2e_ascend by @FightingZhen in #4527
- [rollout] fix: correct heap-based load balancing in AsyncLLMServerManager by @hellcatCS in #4505
- [sglang, rollout] feat: delete remaining sglang spmd code by @PeterSH6 in #4523
- [data] feat: TransferQueue - Add zero-copy serialization support & usage improvement by @0oshowero0 in #4429
- [rollout] feat: pass agent_data to tool calling by @wuxibin89 in #4469
- [megatron,ci] chore: update instructions and scripts for LoRA by @HollowMan6 in #4533
- [megatron] chore: clean legacy code path part 1, make engine use mbridge by default by @ISEEKYAN in #4528
- [megatron] chore: clean legacy code path part 2, clean legacy CI by @ISEEKYAN in #4529
- [trainer] fix: model engine vlm multi_modal_inputs to NonTensorStack by @wuxibin89 in #4492
- [ray] chore: Update Ray version dependency in requirements-npu.txt by @FightingZhen in #4543
- [ci] chore: migrate all rm related ci to reward loop by @yyDing1 in #4520
- [algo] fix: Add seq mean mask denominator option by @szrlee in #4510
- [trainer] fix: change name for reward loop worker override by @ryxli in #4549
- [rollout,vllm] feat: disable sleep mode in fully-async mode by @chenjiaoAngel in #4521
- [rollout, trainer] feat: extend agent loop for custom implementations by @JoyboyBrian in #4548
- [rollout] chore: update reward loop file names by @yyDing1 in #4547
- [ci] fix: Add mbridge dependency into e2e_ascend by @FightingZhen in #4560
- [doc] feat: add JupyterLab plugin instructions by @yqsstudy in #4536
- [ci] feat: Increase e2e_sft timeout from 30 to 40 minutes by @vermouth1992 in #4552
- [misc] chore: add "reward" tag to PR template by @yyDing1 in #4573
- [BREAKING][recipe, ckpt] feat: support parameter sync by checkpoint-engine. only for fully_async mode. by @zpltys in #4427
- [training_utils] fix: fix model enum acquire logic error in registry by @FightingZhen in #4577
- [megatron] feat: add script for qwen3next training by @ISEEKYAN in #4582
- [ci] fix: exclude FSDP-related source files from Megatron CI by @zzhbrr in #4574
- [reward,ci] fix: cast by @tongyx361 in #4594
- [vllm] feat: TensorLoRARequest support newer vLLM versions by @HollowMan6 in #4606
- [misc] feat: always use robust
get_event_loopby @tongyx361 in #4603 - [trainer] feat: Implemented VeomniEngine as a alternative training backend by @A1waysBeenHere in #4072
- [perf] fix: modify the NPU profiler default configuration by @tardis-key in #4475
- [megatron] feat: support discrete profiling for mindspeed by @tardis-key in #4271
- [doc] chore: update LoRA docs with megatron guidelines by @HollowMan6 in #4565
- [reward] feat: Optimize reward computation when use_reward_loop=True by @none0663 in #4581
- [rollout] chore: rename reward loop class name and update ci by @yyDing1 in #4572
- [log] fix: fix wandb log validate run error on async-tool by @chenjiaoAngel in #4591
- [sglang] fix:
warmup_thread_args->warmup_thread_kwargsinaync_sglang_server.pyby @EduardDurech in #4617 - [reward] feat: use
load_extern_objectinget_custom_reward_fn, supporting pkg path by @tongyx361 in #4615 - [vllm] fix: correctly pass params to
from_lora_tensorsin vLLM 0.12.0 by @HollowMan6 in #4614 - [reward,doc] feat: enrich the reward loop documentation by @yyDing1 in #4619
- [megatron] fix: fix MLA with sequence packing + CP by @wuweiqiang24 in #4611
- [megatron, doc] refactor: update the megatron doc by @ISEEKYAN in #4630
- [reward] feat: add retry to the request post method in the reward loop by @yyDing1 in #4628
- [vllm] fix: LoRAModel import path change for vLLM 0.13.0 by @HollowMan6 in #4631
- [misc] refactor: refactor flops counter by @vermouth1992 in #4633
- [misc] feat: add importlib option to import external reward loop module by @PeterSH6 in #4635
- [rollout] feat: ensure max_new_tokens is set correctly in sampling_params by @yanyc428 in #4634
- [recipe] feat: accelerate rollout via model-free speculative decoding by @He-Jingkai in #4535
- [training_utils] feat: use TMA to load Tiles in linear_cross_entropy kernels by @CtfGo in #4576
- [data] feat: Add multimodal dataset fliter for user-customized results by @Kite0011 in #4608
- [vllm] feat: Support online quant for rollout with torchao by @jerryzh168 in #3084
- [misc] feat: Update news section in README.md by @vermouth1992 in #4646
- [algo] feat: add cispo by @xvlincaigou in #4508
- [data] feat: TransferQueue - remove redundant data collect for both TQ and DataProto by @0oshowero0 in #4618
- [recipe, perf] feat: add nsys profiler support for env worker by @chenchaoxu7575 in #4463
- [worker] fix: Add profiler initialization for ActorRolloutRefWorker in engine_worker by @pqhgit in #4586
- [recipe, megatron, fsdp] fix: checkpoint-engine fix trainer param offload in fully-async mode by @zpltys in #4655
- [doc] feat: Add fine-grained profiling tutorial for FSDP and Megatron on Ascend by @mengchengTang in #4610
- [misc] feat:
.git-blame-ignore-revsfor large but non-informative commits by @tongyx361 in #4661 - [doc] feat: Add OpenTinker to awesome work list by @zhusq20 in #4669
- [fsdp] feat: Support zero2 optional feature for FSDP1 by @ZLiao097 in #4659
- [rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario by @PeterSH6 in #4668
- [data] feat: TransferQueue - Support sync TransferQueue client & optimize clear interface and validation procedure by @0oshowero0 in #4660
- [misc] fix:
.git-blame-ignore-revsfile is invalid by @HollowMan6 in #4674 - [training_utils] fix: no allocator set when using TMA for kernels by @HollowMan6 in #4676
- [fsdp] fix: replicate ref compute_log_prob (disable
calculate_entropy...) in LoRA by @HollowMan6 in #4675 - [algo] SAPO algo by Qwen by @BounharAbdelaziz in #4345
- [megatron] fix: megatron async save ckpt fix by @Leem-Li in #4638
- [ci] fix: fix config by @vermouth1992 in #4685
- Revert "[rollout] fix: delete problematic assert for max_tokens <= response_length in multi-turn scenario" by @vermouth1992 in #4687
- [trainer, fsdp, megatron] feat: support one_step_off_policy on Ascend NPU by @baymax591 in #4686
- [ci] test: add one step off policy test cases for npu by @ji-huazhong in #4485
- fix(lora): use TOKEN_CLS task type for Critic model by @yurekami in #4695
- fix: correct enable_activation_offload config parameter name by @yurekami in #4692
- [misc] fix: deprecate rollout.mode config option by @yurekami in #4690
- [ci] feat: Set Megatron related environment variable with ENV in Ascend dockerfile by @FightingZhen in #4699
- [docker] feat: update stable image to vllm==0.12.0, sglang==0.5.6 by @Begunner in #4653
- [rollout] fix: use configured response_length as default max_tokens in vLLM async server by @yurekami in #4703
- [megatron] fix: set model to eval during compute_log_prob/compute_values by @HollowMan6 in #4708
- [trainer] fix: fallback vision tower to flash_attention_2 for Qwen2.5-VL when u… by @aoshen524 in #4670
- [docker] fix: new images for sgl056 and vllm012 have compatibility issues by @Begunner in #4714
- [docs] feat: improve docstrings in tensordict_utils.py (#1345) by @yurekami in #4732
- [rollout,docs] fix: improve error message (#4682) and docstrings (#1345) by @yurekami in #4729
- docs: fix typos in code comments and messages by @yurekami in #4724
- [training_utils] fix: RM extra scaling in KL/PG losses by @JacobHelwig in #4711
- [deployment] feat: support build docker image with aarch64 platform by @rainj-me in #4605
- [megatron] fix: Bump Megatron-Bridge commit for PEFT recompute by @HollowMan6 in #4702
- [docs] feat: improve docstrings in seqlen_balancing.py (#1345) by @yurekami in #4731
- [doc] feat: improve docstrings in torch_functional.py (#1345) by @yurekami in #4730
- [reward] fix: make
RateLimitedRewardManageraccept legacy kwargs by @JoyboyBrian in #4739 - [perf] feat: support profiler in model engine and sft trainer by @vermouth1992 in #4749
- [ci] test: move cpu tests to volcengine machines by @Begunner in #4738
- [trainer,megatron] fix: super tiny fix the issue of repeatedly importing the mindspeed patch by @ji-huazhong in #4751
- [perf]feat: GPT-OSS mfu compute support by @mikequan0425 in #4750
- [tool] fix: attach to existing MLflow run when MLFLOW_RUN_ID is set by @dubin555 in #4740
- [trainer] fix: use dp_size instead of world_size in _balance_batch by @yurekami in #4697
- [rollout] feat: Add vllm logprob mode and default processed_logprob by @RobotGF in #4755
- [ci] fix: fix precommit by @vermouth1992 in #4760
- [ci] test: migrate sft test cases on npu to model engine implementation by @ji-huazhong in #4762
- [doc, cfg] fix: correct typos in training and docker configurations by @Racktic in #4767
- [vllm] fix: use packaging.version for correct semantic version comparison by @Racktic in #4768
- Revert "[algo] fix: Add seq mean mask denominator option" by @wuxibin89 in #4769
- [data] feat: major refactor RLHFDataset for multi-modal data by @wuxibin89 in #4759
- [perf] feat: add remote reward manager and fix math verify issue by @yyDing1 in #4752
- [trainer] feat: enable ray-based sft trainer on ascend npu by @ji-huazhong in #4764
- [training_utils] fix: Nested tensor micro-batching by @JacobHelwig in #4776
- [worker] fix: Config for PPO batch size by @JacobHelwig in #4773
- [ci] fix: fix cpu unit test by @vermouth1992 in #4774
- [cfg] chore: remove redundant fields and fix typo by @JoyboyBrian in #4754
- [worker] fix: Model engine parameter offload by @JacobHelwig in #4777
- [fsdp] feat: integrate TiledMLP for memory-efficient MLP computation by @kevssim in #4649
- [doc] chore: Update ascend_quick_start.rst by @wucong25 in #4609
- [sglang, vllm, rollout] fix: use model's max_position_embeddings for max_model_len by @PeterSH6 in #4779
- [doc] fix: reward_loop enable flag name by @zhuangqh in #4788
- [doc] feat: add v0.7 release blog by @wuxibin89 in #4796
New Contributors
- @gongyisheng made their first contribution in #4125
- @Annarine made their first contribution in #4130
- @HelloWorldBeginner made their first contribution in #4153
- @wlhgtc made their first contribution in #4156
- @sl-1314 made their first contribution in #4067
- @kerrickstaley made their first contribution in #4171
- @JoyboyBrian made their first contribution in #4107
- @shevateng0 made their first contribution in #4139
- @ashvinnihalani made their first contribution in #4091
- @johnjunjun7 made their first contribution in #3427
- @zjchenn made their first contribution in #4184
- @HzZHoO made their first contribution in #4183
- @EricMarcus-ai made their first contribution in #4185
- @Shiguang-Guo made their first contribution in #4187
- @Agoniii made their first contribution in #3519
- @jQizhang made their first contribution in #4222
- @JobQiu made their first contribution in #4248
- @momo609 made their first contribution in #4166
- @Kite0011 made their first contribution in #4250
- @LLLLxmmm made their first contribution in #4175
- @jprellberg made their first contribution in #4196
- @chengminhua made their first contribution in #4209
- @Leem-Li made their first contribution in #4253
- @nuerxiati made their first contribution in #4165
- @litianjian made their first contribution in #4101
- @appletea233 made their first contribution in #4410
- @jsfanfanfan made their first contribution in #4408
- @icerain-alt made their first contribution in #4406
- @Lokiscripter made their first contribution in #4398
- @HanlinDu made their first contribution in #4433
- @5082459 made their first contribution in #4432
- @DaizeDong made their first contribution in #4396
- @ryxli made their first contribution in #4423
- @study8677 made their first contribution in #4486
- @pengyanai made their first contribution in #4402
- @hellcatCS made their first contribution in #4505
- @yqsstudy made their first contribution in #4536
- @zpltys made their first contribution in #4427
- @zzhbrr made their first contribution in #4574
- @wuweiqiang24 made their first contribution in #4611
- @yanyc428 made their first contribution in #4634
- @He-Jingkai made their first contribution in #4535
- @CtfGo made their first contribution in #4576
- @jerryzh168 made their first contribution in #3084
- @xvlincaigou made their first contribution in #4508
- @chenchaoxu7575 made their first contribution in #4463
- @pqhgit made their first contribution in #4586
- @mengchengTang made their first contribution in #4610
- @zhusq20 made their first contribution in #4669
- @BounharAbdelaziz made their first contribution in #4345
- @yurekami made their first contribution in #4695
- @Begunner made their first contribution in #4653
- @JacobHelwig made their first contribution in #4711
- @rainj-me made their first contribution in #4605
- @mikequan0425 made their first contribution in #4750
- @dubin555 made their first contribution in #4740
- @RobotGF made their first contribution in #4755
- @Racktic made their first contribution in #4767
- @wucong25 made their first contribution in #4609
- @zhuangqh made their first contribution in #4788
Full Changelog: v0.6.1...v0.7.0