CogVideoX-5B
This patch release adds diffusers support for the upcoming CogVideoX-5B release! The model weights will be available next week on the Huggingface Hub at THUDM/CogVideoX-5b
. Stay tuned for the release!
Additionally, we have implemented VAE tiling feature, which reduces the memory requirement for CogVideoX models. With this update, the total memory requirement is now 12GB for CogVideoX-2B and 21GB for CogVideoX-5B (with CPU offloading). To Enable this feature, simply call enable_tiling()
on the VAE.
The code below shows how to generate a video with CogVideoX-5B
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
prompt = "Tracking shot,late afternoon light casting long shadows,a cyclist in athletic gear pedaling down a scenic mountain road,winding path with trees and a lake in the background,invigorating and adventurous atmosphere."
pipe = CogVideoXPipeline.from_pretrained(
"THUDM/CogVideoX-5b",
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
video = pipe(
prompt=prompt,
num_videos_per_prompt=1,
num_inference_steps=50,
num_frames=49,
guidance_scale=6,
).frames[0]
export_to_video(video, "output.mp4", fps=8)
000000.mp4
Refer to our documentation to learn more about it.
All commits
- Update Video Loading/Export to use
imageio
by @DN6 in #9094 - [refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150
- Add Learned PE selection for Auraflow by @cloneofsimo in #9182
- [Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229
- [Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236
- [tests] fix broken xformers tests by @a-r-r-o-w in #9206
- Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203
- [Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243