Highlights
New algorithms and recipes
- Vision language reasoning with qwen2.5-vl #386
- PRIME, RLOO, remax #753 #234 #341
- FIRE sampling algorithm, math-verify rewards #545 #683
Engine
- sglang integration is available for preview (single node with FSDP). Blazing fast! Please try and give us feedbacks! We recommend using verl main branch for continuous slang related fixes and improvement upon feedbacks.
--actor_rollout_ref.rollout.name='sglang'
- Megatron is now upgraded to v0.11. Supporting checkpoint manager, qwen model & GRPO algorithm
- vllm upgraded to v0.8.2, much faster than vllm v0.7 & v0.6.3 during rollout with the v1 engine! Please remember to enable cuda graph with the following option. There were memory leak issues before vllm v0.8.2, we recommend either using vllm v0.6.3 or v0.8.2.
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
Hardware:
- AMD support is available for vllm and FSDP backend. Getting started one pager is here
Docs:
- tutorial for distributed training setup, debugging, and the programming model
Roadmap for Q2: #710. Contributions are welcome!
Changelog
New Features
Algorithm Support
- Support for
extra_infoin reward calculation - RLOO advantage estimator
- PRIME algorithm (recipe and baseline)
- Initial support for VLMs (Vision-Language Models), including Qwen2.5VL GRPO example
- Math-Verify Support
- Support for GRPO with Megatron backend
- Added FIRE sampling in rollout
- Replaced
DataLoaderwithStatefulDataLoaderfor checkpoint resuming - Support for external reward function loading
Performance Improvements
- Support for SGLang as a rollout engine
- Support for Ulysses sequence parallel (transformers >= 0.48)
- Support offloading parameters and optimizer during rollout
- Tracking support for vemlp and TensorBoard
- MFU (Model FLOPS Utilization) calculation for Megatron workers
- Support for AMD (ROCm kernel)
- Improved checkpoint loading (Megatron support for Llama/Qwen models)
- Remove unnecessary
torch.cuda.empty_cache()calls - Optimized weight loading (replaced custom VLLM loader with
model.load_weights)
Bug Fixes
- Fixed wrong args description
- Fixed Gemma2 example and NGC Dockerfile
- Fixed offload/load optimizer implementation
- Fixed VLLM documentation links
- Fixed typos and spelling errors
- Fixed evaluation file path in Remax training scripts
- Fixed OOM when resuming from checkpoint
- Fixed position embedding for Qwen2.5-VL
- Fixed PRIME algorithm issues (filtering long prompts, padding side, xformers)
- Fixed FSDP checkpoint loading
- Fixed SGLang rollout under multi-node
- Fixed Python environment issues in installation
- Fixed validation batch repeat before feeding into rollout
Deprecations and Breaking Changes
- Deprecated
val_batch_size - Removed redundant config parameters
- Reverted RLHFDataset truncation config
Improvements
Documentation
- Added Ray on Slurm example
- Added FAQ for VLLM illegal memory access
- Added distributed training docs (RLOO, VolcEngine)
- Updated VLLM (>=0.7, >=0.8) documentation
- Added meetup info, blogs, and project references
- Improved Slurm example parameters
- Added multi-node training and debug tutorial
Tooling & CI/CD
- Added Dependabot action
- Added secrets scan action
- Added CI timeout and auto-cancel previous CI runs
- Added e2e_ascend CI
- Improved dataset handling in CI
Miscellaneous
- Added assertion checks for PPO mini-batch size
- Improved logging (SwanLab integration)
- Pre-check resource pool availability to prevent hangs
- Added tqdm progress bar for RayPPOTrainer
- Skip special tokens in processing
- Support for faster model downloads from ModelScope
- Added Dockerfile for AWS SageMaker
New Contributors
This new release is contributed by 60 contributors, of which 47 are new contributors!
@AnselCmy @BASARANOMO @BaiqingL @BeSkyer @BearBiscuit05 @CajZella @Django-Jiang @DolbyUUU @ETOgaosion @HaoshengZou @ISEEKYAN @Kunlun-Zhu @PeterSH6 @PzySeere @Raf-Chen @WillemJiang @Yifan-Song793 @ZSL98 @Zeetc @ZefanW @Zeyi-Lin @caaatch22 @celestialli @danielz02 @dependabot @dirtyDan0 @eltociear @eric-haibin-lin @fyqqyf @gameofdimension @ganler @haoy-zzz @hiyouga @hongpeng-guo @iceflame89 @jayl940712 @kinman0224 @laonahongchen @liudayuan-carrot @maksimstw @mi804 @minleminzui @nomadlx @none0663 @nwiad @ocss884 @pat-jj @thomZ1 @tongyx361 @uygnef @vermouth1992 @wangchengnuo @wuxibin89 @xffxff @yaguanghu @yushengsu-thu @yyDing1 @zhanluxianshen @zhr2001 @zpqiu
Thank you all for making verl better!!
Full Changelog: v0.2.0.post2...v0.3.0.post0
Known issues tracker: #827