huggingface/diffusers v0.30.0 on GitHub

New pipelines

Image taken from the Lumina’s GitHub.

This release features many new pipelines. Below, we provide a list:

Audio pipelines 🎼

Stable Audio

Video pipelines 📹

Latte (thanks to @maxin-cn for the contribution through #8404)
CogVideoX (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)

Image pipelines 🎇

Lumina (thanks to @PommesPeter for the contribution through #8652)
Kolors
AuraFlow
Flux

Be sure to check out the respective docs to know more about these pipelines. Some additional pointers are below for curious minds:

Lumina introduces a new DiT architecture that is multilingual in nature.
Kolors is inspired by SDXL and is also multilingual in nature.
Flux introduces the largest (more than 12B parameters!) open-sourced DiT variant available to date. For efficient DreamBooth + LoRA training, we recommend @bghira’s guide here.
We have worked on a guide that shows how to quantize these large pipelines for memory efficiency with optimum.quanto. Check it out here.
CogVideoX introduces a novel and truly 3D VAE into Diffusers.

Perturbed Attention Guidance (PAG)

Without PAG	With PAG

We already had community pipelines for PAG, but given its usefulness, we decided to make it a first-class citizen of the library. We have a central usage guide for PAG here, which should be the entry point for a user interested in understanding and using PAG for their use cases. We currently support the following pipelines with PAG:

StableDiffusionPAGPipeline
StableDiffusion3PAGPipeline
StableDiffusionControlNetPAGPipeline
StableDiffusionXLPAGPipeline
StableDiffusionXLPAGImg2ImgPipeline
StableDiffusionXLPAGInpaintPipeline
StableDiffusionXLControlNetPAGPipeline
StableDiffusion3PAGPipeline
PixArtSigmaPAGPipeline
HunyuanDiTPAGPipeline
AnimateDiffPAGPipeline
KolorsPAGPipeline

If you’re interested in helping us extend our PAG support for other pipelines, please check out this thread.
Special thanks to Ahn Donghoon (@sunovivid), the author of PAG, for helping us with the integration and adding PAG support to SD3.

AnimateDiff with SparseCtrl

SparseCtrl introduces methods of controllability into text-to-video diffusion models leveraging signals such as line/edge sketches, depth maps, and RGB images by incorporating an additional condition encoder, inspired by ControlNet, to process these signals in the AnimateDiff framework. It can be applied to a diverse set of applications such as interpolation or video prediction (filling in the gaps between sequence of images for animation), personalized image animation, sketch-to-video, depth-to-video, and more. It was introduced in SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.

There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:

Scribble Interpolation Example:

import torch

from diffusers import AnimateDiffSparseControlNetPipeline, AutoencoderKL, MotionAdapter, SparseControlNetModel
from diffusers.schedulers import DPMSolverMultistepScheduler
from diffusers.utils import export_to_gif, load_image

motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-3", torch_dtype=torch.float16).to(device)
controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-scribble", torch_dtype=torch.float16).to(device)
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to(device)
pipe = AnimateDiffSparseControlNetPipeline.from_pretrained(
    "SG161222/Realistic_Vision_V5.1_noVAE",
    motion_adapter=motion_adapter,
    controlnet=controlnet,
    vae=vae,
    scheduler=scheduler,
    torch_dtype=torch.float16,
).to(device)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, beta_schedule="linear", algorithm_type="dpmsolver++", use_karras_sigmas=True)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-v1-5-3", adapter_name="motion_lora")
pipe.fuse_lora(lora_scale=1.0)

prompt = "an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality"
negative_prompt = "low quality, worst quality, letterboxed"

image_files = [
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png",
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png"
]
condition_frame_indices = [0, 8, 15]
conditioning_frames = [load_image(img_file) for img_file in image_files]

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=25,
    conditioning_frames=conditioning_frames,
    controlnet_conditioning_scale=1.0,
    controlnet_frame_indices=condition_frame_indices,
    generator=torch.Generator().manual_seed(1337),
).frames[0]
export_to_gif(video, "output.gif")

📜 Check out the docs here.

FreeNoise for AnimateDiff

FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.

Instead of initializing noises for all frames, FreeNoise reschedules a sequence of noises for long-range correlation and performs temporal attention over them using a window-based function. We have added FreeNoise to the AnimateDiff family of models in Diffusers, allowing them to generate videos beyond their default 32 frame limit.

import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
pipe = AnimateDiffPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerAncestralDiscreteScheduler(
    beta_schedule="linear",
    beta_start=0.00085,
    beta_end=0.012,
)

pipe.enable_free_noise()
pipe.vae.enable_slicing()

pipe.enable_model_cpu_offload()
frames = pipe(
    "An astronaut riding a horse on Mars.",
    num_frames=64,
    num_inference_steps=20,
    guidance_scale=7.0,
    decode_chunk_size=2,
).frames[0]

export_to_gif(frames, "freenoise-64.gif")

LoRA refactor

We have significantly refactored the loader classes associated with LoRA. Going forward, this will help in adding LoRA support for new pipelines and models. We now have a LoraBaseMixin class which is subclassed by the different pipeline-level LoRA loading classes such as StableDiffusionXLLoraLoaderMixin. This document provides an overview of the available classes.

Additionally, we have increased the coverage of methods within the PeftAdapterMixin class. This refactoring allows all the supported models to share common LoRA functionalities such set_adapter(), add_adapter(), and so on.

To learn more details, please follow this PR. If you see any LoRA-related issues stemming from these refactors, please open an issue.

🚨 Fixing attention projection fusion

We discovered that the implementation of fuse_qkv_projections() was broken. This was fixed in this PR. Additionally, this PR added the fusion support to AuraFlow and PixArt Sigma. A reasoning as to where this kind of fusion might be useful is available here.

All commits

[Release notification] add some info when there is an error. by @sayakpaul in #8718
Modify FlowMatch Scale Noise by @asomoza in #8678
Fix json WindowsPath crash by @vincedovy in #8662
Motion Model / Adapter versatility by @Arlaz in #8301
[Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
[Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
[Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
[Community pipeline] SD3 Differential Diffusion Img2Img Pipeline by @asomoza in #8679
Benchmarking workflow fix by @sayakpaul in #8389
add PAG support for SD architecture by @shauray8 in #8725
shift cache in benchmarking. by @sayakpaul in #8740
[train_controlnet_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8476
fix the LR schedulers for dreambooth_lora by @WenheLI in #8510
[Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support by @gnobitab in #8747
Always raise from previous error by @Wauplin in #8751
[doc] add a tip about using SDXL refiner with hunyuan-dit and pixart by @yiyixuxu in #8735
Remove legacy single file model loading mixins by @DN6 in #8754
Allow from_transformer in SD3ControlNetModel by @haofanwang in #8749
[SD3 LoRA Training] Fix errors when not training text encoders by @asomoza in #8743
[Tests] add test suite for SD3 DreamBooth by @sayakpaul in #8650
[hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding by @yiyixuxu in #8761
Enforce ordering when running Pipeline slow tests by @DN6 in #8763
Fix warning in UNetMotionModel by @DN6 in #8756
Fix indent in dreambooth lora advanced SD 15 script by @DN6 in #8753
Fix mistake in Single File Docs page by @DN6 in #8765
Reflect few contributions on philosophy.md that were not reflected on #8294 by @mreraser in #8690
correct attention_head_dim for JointTransformerBlock by @yiyixuxu in #8608
[LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8670
Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @sayakpaul in #8773
Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab by @sayakpaul in #8762
Update README.md to include Colab link by @sayakpaul in #8775
[Chore] add dummy lora attention processors to prevent failures in other libs by @sayakpaul in #8777
[advanced dreambooth lora] add clip_skip arg by @linoytsaban in #8715
[Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet by @gnobitab in #8783
Fix minor bug in SD3 img2img test by @a-r-r-o-w in #8779
[Tests] fix sharding tests by @sayakpaul in #8764
Add vae_roundtrip.py example by @thomaseding in #7104
[Single File] Allow loading T5 encoder in mixed precision by @DN6 in #8778
Fix saving text encoder weights and kohya weights in advanced dreambooth lora script by @DN6 in #8766
Improve model card for push_to_hub trainers by @apolinario in #8697
fix loading sharded checkpoints from subfolder by @yiyixuxu in #8798
[Alpha-VLLM Team] Add Lumina-T2X to diffusers by @PommesPeter in #8652
Fix static typing and doc typos by @zhuoqun-chen in #8807
Remove unnecessary lines by @tolgacangoz in #8569
Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference by @IrohXu in #8709
[Tests] fix more sharding tests by @sayakpaul in #8797
Reformat docstring for get_timestep_embedding by @alanhdu in #8811
Latte: Latent Diffusion Transformer for Video Generation by @maxin-cn in #8404
[Core] Add Kolors by @asomoza in #8812
[Core] Add AuraFlow by @sayakpaul in #8796
Add VAE tiling option for SD3 by @DN6 in #8791
Add single file loading support for AnimateDiff by @DN6 in #8819
[Docs] add AuraFlow docs by @sayakpaul in #8851
[Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU by @ustcuna in #8643
add PAG support sd15 controlnet by @tuanh123789 in #8820
[tests] fix typo in pag tests by @a-r-r-o-w in #8845
[Docker] include python3.10 dev and solve header missing problem by @sayakpaul in #8865
[Cont'd] Add the SDE variant of ~~DPM-Solver~~ and DPM-Solver++ to DPM Single Step by @tolgacangoz in #8269
modify pocs. by @sayakpaul in #8867
[Core] fix: shard loading and saving when variant is provided. by @sayakpaul in #8869
[Chore] allow auraflow latest to be torch compile compatible. by @sayakpaul in #8859
Add AuraFlowPipeline and KolorsPipeline to auto map by @Beinsezii in #8849
Fix multi-gpu case for train_cm_ct_unconditional.py by @tolgacangoz in #8653
[docs] pipeline docs for latte by @a-r-r-o-w in #8844
[Chore] add disable forward chunking to SD3 transformer. by @sayakpaul in #8838
[Core] remove resume_download from Hub related stuff by @sayakpaul in #8648
Add option to SSH into CPU runner. by @DN6 in #8884
SSH into cpu runner fix by @DN6 in #8888
SSH into cpu runner additional fix by @DN6 in #8893
[SDXL] Fix uncaught error with image to image by @asomoza in #8856
fix loop bug in SlicedAttnProcessor by @shinetzh in #8836
[fix code annotation] Adjust the dimensions of the rotary positional embedding. by @wangqixun in #8890
allow tensors in several schedulers step() call by @catwell in #8905
Use model_info.id instead of model_info.modelId by @Wauplin in #8912
[Training] SD3 training fixes by @sayakpaul in #8917
🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) by @Snailpong in #8804
[Docs] small fixes to pag guide. by @sayakpaul in #8920
Reflect few contributions on ethical_guidelines.md that were not reflected on #8294 by @mreraser in #8914
[Tests] proper skipping of request caching test by @sayakpaul in #8908
Add attentionless VAE support by @Gothos in #8769
[Benchmarking] check if runner helps to restore benchmarking by @sayakpaul in #8929
Update pipeline test fetcher by @DN6 in #8931
[Tests] reduce the model size in the audioldm2 fast test by @ariG23498 in #7846
fix: checkpoint save issue in advanced dreambooth lora sdxl script by @akbaig in #8926
[Tests] Improve transformers model test suite coverage - Temporal Transformer by @rootonchair in #8932
Fix Colab and Notebook checks for diffusers-cli env by @tolgacangoz in #8408
Fix name when saving text inversion embeddings in dreambooth advanced scripts by @DN6 in #8927
[Core] fix QKV fusion for attention by @sayakpaul in #8829
remove residual i from auraflow. by @sayakpaul in #8949
[CI] Skip flaky download tests in PR CI by @DN6 in #8945
[AuraFlow] fix long prompt handling by @sayakpaul in #8937
Added Code for Gradient Accumulation to work for basic_training by @RandomGamingDev in #8961
[AudioLDM2] Fix cache pos for GPT-2 generation by @sanchit-gandhi in #8964
[Tests] fix slices of 26 tests (first half) by @sayakpaul in #8959
[CI] Slow Test Updates by @DN6 in #8870
[tests] speed up animatediff tests by @a-r-r-o-w in #8846
[LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8774
Update TensorRT img2img community pipeline by @asfiyab-nvidia in #8899
Enable CivitAI SDXL Inpainting Models Conversion by @mazharosama in #8795
Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @yiyixuxu in #8976
fix guidance_scale value not equal to the value in comments by @efwfe in #8941
[Chore] remove all is from auraflow. by @sayakpaul in #8980
[Chore] add LoraLoaderMixin to the inits by @sayakpaul in #8981
Added accelerator based gradient accumulation for basic_example by @RandomGamingDev in #8966
[CI] Fix parallelism in nightly tests by @DN6 in #8983
[CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix by @DN6 in #8986
[fix] FreeInit step index out of bounds by @a-r-r-o-w in #8969
[core] AnimateDiff SparseCtrl by @a-r-r-o-w in #8897
remove unused code from pag attn procs by @a-r-r-o-w in #8928
[Kolors] Add IP Adapter by @asomoza in #8901
[CI] Update runner configuration for setup and nightly tests by @XciD in #9005
[Docs] credit where it's due for Lumina and Latte. by @sayakpaul in #9000
handle lora scale and clip skip in lpw sd and sdxl community pipelines by @noskill in #8988
[LoRA] fix: animate diff lora stuff. by @sayakpaul in #8995
Stable Audio integration by @ylacombe in #8716
[core] Move community AnimateDiff ControlNet to core by @a-r-r-o-w in #8972
Fix Stable Audio repository id by @ylacombe in #9016
PAG variant for AnimateDiff by @a-r-r-o-w in #8789
Updates deps for pipeline test fetcher by @DN6 in #9033
fix load sharded checkpoint from a subfolder (local path) by @yiyixuxu in #8913
[docs] fix pia example by @a-r-r-o-w in #9015
Flux pipeline by @sayakpaul in #9043
[Core] Add PAG support for PixArtSigma by @sayakpaul in #8921
[Flux] allow tests to run by @sayakpaul in #9050
Fix Nightly Deps by @DN6 in #9036
Update transformer_flux.py by @haofanwang in #9060
Errata: Fix typos & \s+$ by @tolgacangoz in #9008
[refactor] create modeling blocks specific to AnimateDiff by @a-r-r-o-w in #8979
Fix grammar mistake. by @prideout in #9072
[Flux] minor documentation fixes for flux. by @sayakpaul in #9048
Update TensorRT txt2img and inpaint community pipelines by @asfiyab-nvidia in #9037
type get_attention_scores as optional in get_attention_scores by @psychedelicious in #9075
[refactor] apply qk norm in attention processors by @a-r-r-o-w in #9071
[FLUX] support LoRA by @sayakpaul in #9057
[Tests] Improve transformers model test suite coverage - Latte by @rootonchair in #8919
PAG variant for HunyuanDiT, PAG refactor by @a-r-r-o-w in #8936
[Docs] add stable cascade unet doc. by @sayakpaul in #9066
add sentencepiece as a soft dependency by @yiyixuxu in #9065
Fix typos by @omahs in #9077
Update CLIPFeatureExtractor to CLIPImageProcessor and DPTFeatureExtractor to DPTImageProcessor by @tolgacangoz in #9002
[Core] add QKV fusion to AuraFlow and PixArt Sigma by @sayakpaul in #8952
[bug] remove unreachable norm_type=ada_norm_continuous from norm3 initialization conditions by @a-r-r-o-w in #9006
[Tests] Improve transformers model test suite coverage - Hunyuan DiT by @rootonchair in #8916
update by @DN6 (direct commit on v0.30.0-release)
[Docs] Add community projects section to docs by @DN6 in #9013
add PAG support for Stable Diffusion 3 by @sunovivid in #8861
Fix loading sharded checkpoints when we have variants by @SunMarc in #9061
[Single File] Add single file support for Flux Transformer by @DN6 in #9083
[Kolors] Add PAG by @asomoza in #8934
fix train_dreambooth_lora_sd3.py loading hook by @sayakpaul in #9107
[core] FreeNoise by @a-r-r-o-w in #8948
Flux fp16 inference fix by @latentCall145 in #9097
[feat] allow sparsectrl to be loaded from single file by @a-r-r-o-w in #9073
Freenoise change vae_batch_size to decode_chunk_size by @DN6 in #9110
Add CogVideoX text-to-video generation model by @zRzRzRzRzRzRzR in #9082
Release: v0.30.0 by @sayakpaul (direct commit on v0.30.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@DN6
- [Tests] Fix precision related issues in slow pipeline tests (#8720)
- Remove legacy single file model loading mixins (#8754)
- Enforce ordering when running Pipeline slow tests (#8763)
- Fix warning in UNetMotionModel (#8756)
- Fix indent in dreambooth lora advanced SD 15 script (#8753)
- Fix mistake in Single File Docs page (#8765)
- [Single File] Allow loading T5 encoder in mixed precision (#8778)
- Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
- Add VAE tiling option for SD3 (#8791)
- Add single file loading support for AnimateDiff (#8819)
- Add option to SSH into CPU runner. (#8884)
- SSH into cpu runner fix (#8888)
- SSH into cpu runner additional fix (#8893)
- Update pipeline test fetcher (#8931)
- Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
- [CI] Skip flaky download tests in PR CI (#8945)
- [CI] Slow Test Updates (#8870)
- [CI] Fix parallelism in nightly tests (#8983)
- [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
- Updates deps for pipeline test fetcher (#9033)
- Fix Nightly Deps (#9036)
- update
- [Docs] Add community projects section to docs (#9013)
- [Single File] Add single file support for Flux Transformer (#9083)
- Freenoise change vae_batch_size to decode_chunk_size (#9110)
@shauray8
- add PAG support for SD architecture (#8725)
@gnobitab
- [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)
- [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)
@yiyixuxu
- [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
- [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding (#8761)
- correct attention_head_dim for JointTransformerBlock (#8608)
- fix loading sharded checkpoints from subfolder (#8798)
- Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
- fix load sharded checkpoint from a subfolder (local path) (#8913)
- add sentencepiece as a soft dependency (#9065)
@PommesPeter
- [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
@IrohXu
- Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference (#8709)
@maxin-cn
- Latte: Latent Diffusion Transformer for Video Generation (#8404)
@ustcuna
- [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
@tuanh123789
- add PAG support sd15 controlnet (#8820)
@Snailpong
- 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
@asfiyab-nvidia
- Update TensorRT img2img community pipeline (#8899)
- Update TensorRT txt2img and inpaint community pipelines (#9037)
@ylacombe
- Stable Audio integration (#8716)
- Fix Stable Audio repository id (#9016)
@sunovivid
- add PAG support for Stable Diffusion 3 (#8861)
@zRzRzRzRzRzRzR
- Add CogVideoX text-to-video generation model (#9082)

huggingface/diffusers v0.30.0 v0.30.0: New Pipelines (Flux, Stable Audio, Kolors, CogVideoX, Latte, and more), New Methods (FreeNoise, SparseCtrl), and New Refactors on GitHub