verl-project/verl v0.3.0.post0
v0.3.0.post0 release

on GitHub

latest releases: v0.7.0, v0.6.1, v0.6.0...

11 months ago

Highlights

New algorithms and recipes

Vision language reasoning with qwen2.5-vl #386
PRIME, RLOO, remax #753 #234 #341
FIRE sampling algorithm, math-verify rewards #545 #683

Engine

sglang integration is available for preview (single node with FSDP). Blazing fast! Please try and give us feedbacks! We recommend using verl main branch for continuous slang related fixes and improvement upon feedbacks.

--actor_rollout_ref.rollout.name='sglang'

Megatron is now upgraded to v0.11. Supporting checkpoint manager, qwen model & GRPO algorithm
vllm upgraded to v0.8.2, much faster than vllm v0.7 & v0.6.3 during rollout with the v1 engine! Please remember to enable cuda graph with the following option. There were memory leak issues before vllm v0.8.2, we recommend either using vllm v0.6.3 or v0.8.2.

actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \

Hardware:

AMD support is available for vllm and FSDP backend. Getting started one pager is here

Docs:

tutorial for distributed training setup, debugging, and the programming model

Roadmap for Q2: #710. Contributions are welcome!

Changelog

New Features

Algorithm Support

Support for extra_info in reward calculation
RLOO advantage estimator
PRIME algorithm (recipe and baseline)
Initial support for VLMs (Vision-Language Models), including Qwen2.5VL GRPO example
Math-Verify Support
Support for GRPO with Megatron backend
Added FIRE sampling in rollout
Replaced DataLoader with StatefulDataLoader for checkpoint resuming
Support for external reward function loading

Performance Improvements

Support for SGLang as a rollout engine
Support for Ulysses sequence parallel (transformers >= 0.48)
Support offloading parameters and optimizer during rollout
Tracking support for vemlp and TensorBoard
MFU (Model FLOPS Utilization) calculation for Megatron workers
Support for AMD (ROCm kernel)
Improved checkpoint loading (Megatron support for Llama/Qwen models)
Remove unnecessary torch.cuda.empty_cache() calls
Optimized weight loading (replaced custom VLLM loader with model.load_weights)

Bug Fixes

Fixed wrong args description
Fixed Gemma2 example and NGC Dockerfile
Fixed offload/load optimizer implementation
Fixed VLLM documentation links
Fixed typos and spelling errors
Fixed evaluation file path in Remax training scripts
Fixed OOM when resuming from checkpoint
Fixed position embedding for Qwen2.5-VL
Fixed PRIME algorithm issues (filtering long prompts, padding side, xformers)
Fixed FSDP checkpoint loading
Fixed SGLang rollout under multi-node
Fixed Python environment issues in installation
Fixed validation batch repeat before feeding into rollout

Deprecations and Breaking Changes

Deprecated val_batch_size
Removed redundant config parameters
Reverted RLHFDataset truncation config

Improvements

Documentation

Added Ray on Slurm example
Added FAQ for VLLM illegal memory access
Added distributed training docs (RLOO, VolcEngine)
Updated VLLM (>=0.7, >=0.8) documentation
Added meetup info, blogs, and project references
Improved Slurm example parameters
Added multi-node training and debug tutorial

Tooling & CI/CD

Added Dependabot action
Added secrets scan action
Added CI timeout and auto-cancel previous CI runs
Added e2e_ascend CI
Improved dataset handling in CI

Miscellaneous

Added assertion checks for PPO mini-batch size
Improved logging (SwanLab integration)
Pre-check resource pool availability to prevent hangs
Added tqdm progress bar for RayPPOTrainer
Skip special tokens in processing
Support for faster model downloads from ModelScope
Added Dockerfile for AWS SageMaker

New Contributors

This new release is contributed by 60 contributors, of which 47 are new contributors!
@AnselCmy @BASARANOMO @BaiqingL @BeSkyer @BearBiscuit05 @CajZella @Django-Jiang @DolbyUUU @ETOgaosion @HaoshengZou @ISEEKYAN @Kunlun-Zhu @PeterSH6 @PzySeere @Raf-Chen @WillemJiang @Yifan-Song793 @ZSL98 @Zeetc @ZefanW @Zeyi-Lin @caaatch22 @celestialli @danielz02 @dependabot @dirtyDan0 @eltociear @eric-haibin-lin @fyqqyf @gameofdimension @ganler @haoy-zzz @hiyouga @hongpeng-guo @iceflame89 @jayl940712 @kinman0224 @laonahongchen @liudayuan-carrot @maksimstw @mi804 @minleminzui @nomadlx @none0663 @nwiad @ocss884 @pat-jj @thomZ1 @tongyx361 @uygnef @vermouth1992 @wangchengnuo @wuxibin89 @xffxff @yaguanghu @yushengsu-thu @yyDing1 @zhanluxianshen @zhr2001 @zpqiu
Thank you all for making verl better!!

Full Changelog: v0.2.0.post2...v0.3.0.post0

Known issues tracker: #827

Check out latest releases or
releases around verl-project/verl v0.3.0.post0

Don't miss a new verl release

NewReleases is sending notifications on new releases.

Get notifications