New pipelines
Image taken from the Luminaβs GitHub.
This release features many new pipelines. Below, we provide a list:
Audio pipelines πΌ
Video pipelines πΉ
- Latte (thanks to @maxin-cn for the contribution through #8404)
- CogVideoX (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)
Image pipelines π
Be sure to check out the respective docs to know more about these pipelines. Some additional pointers are below for curious minds:
- Lumina introduces a new DiT architecture that is multilingual in nature.
- Kolors is inspired by SDXL and is also multilingual in nature.
- Flux introduces the largest (more than 12B parameters!) open-sourced DiT variant available to date. For efficient DreamBooth + LoRA training, we recommend @bghiraβs guide here.
- We have worked on a guide that shows how to quantize these large pipelines for memory efficiency with
optimum.quanto
. Check it out here. - CogVideoX introduces a novel and truly 3D VAE into Diffusers
Perturbed Attention Guidance (PAG)
Without PAG | With PAG |
---|---|
We already had community pipelines for PAG, but given its usefulness, we decided to make it a first-class citizen of the library. We have a central usage guide for PAG here, which should be the entry point for a user interested in understanding and using PAG for their use cases. We currently support the following pipelines with PAG:
StableDiffusionPAGPipeline
StableDiffusion3PAGPipeline
StableDiffusionControlNetPAGPipeline
StableDiffusionXLPAGPipeline
StableDiffusionXLPAGImg2ImgPipeline
StableDiffusionXLPAGInpaintPipeline
StableDiffusionXLControlNetPAGPipeline
StableDiffusion3PAGPipeline
PixArtSigmaPAGPipeline
HunyuanDiTPAGPipeline
AnimateDiffPAGPipeline
KolorsPAGPipeline
If youβre interested in helping us extend our PAG support for other pipelines, please check out this thread.
Special thanks to Ahn Donghoon (@sunovivid), the author of PAG, for helping us with the integration and adding PAG support to SD3.
AnimateDiff with SparseCtrl
SparseCtrl introduces methods of controllability into text-to-video diffusion models leveraging signals such as line/edge sketches, depth maps, and RGB images by incorporating an additional condition encoder, inspired by ControlNet, to process these signals in the AnimateDiff framework. It can be applied to a diverse set of applications such as interpolation or video prediction (filling in the gaps between sequence of images for animation), personalized image animation, sketch-to-video, depth-to-video, and more. It was introduced in **SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.**
There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:
Scribble Interpolation Example:
import torch
from diffusers import AnimateDiffSparseControlNetPipeline, AutoencoderKL, MotionAdapter, SparseControlNetModel
from diffusers.schedulers import DPMSolverMultistepScheduler
from diffusers.utils import export_to_gif, load_image
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-3", torch_dtype=torch.float16).to(device)
controlnet = SparseControlNetModel.from_pretrained("guoyww/animatediff-sparsectrl-scribble", torch_dtype=torch.float16).to(device)
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to(device)
pipe = AnimateDiffSparseControlNetPipeline.from_pretrained(
"SG161222/Realistic_Vision_V5.1_noVAE",
motion_adapter=motion_adapter,
controlnet=controlnet,
vae=vae,
scheduler=scheduler,
torch_dtype=torch.float16,
).to(device)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, beta_schedule="linear", algorithm_type="dpmsolver++", use_karras_sigmas=True)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-v1-5-3", adapter_name="motion_lora")
pipe.fuse_lora(lora_scale=1.0)
prompt = "an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality"
negative_prompt = "low quality, worst quality, letterboxed"
image_files = [
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png",
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png",
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png"
]
condition_frame_indices = [0, 8, 15]
conditioning_frames = [load_image(img_file) for img_file in image_files]
video = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=25,
conditioning_frames=conditioning_frames,
controlnet_conditioning_scale=1.0,
controlnet_frame_indices=condition_frame_indices,
generator=torch.Generator().manual_seed(1337),
).frames[0]
export_to_gif(video, "output.gif")
π Check out the docs here.
FreeNoise for AnimateDiff
FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.
Instead of initializing noises for all frames, FreeNoise reschedules a sequence of noises for long-range correlation and performs temporal attention over them using a window-based function. We have added FreeNoise to the AnimateDiff family of models in Diffusers, allowing them to generate videos beyond their default 32 frame limit.
Β
import torch
from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler
from diffusers.utils import export_to_gif
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
pipe = AnimateDiffPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerAncestralDiscreteScheduler(
beta_schedule="linear",
beta_start=0.00085,
beta_end=0.012,
)
pipe.enable_free_noise()
pipe.vae.enable_slicing()
pipe.enable_model_cpu_offload()
frames = pipe(
"An astronaut riding a horse on Mars.",
num_frames=64,
num_inference_steps=20,
guidance_scale=7.0,
decode_chunk_size=2,
).frames[0]
export_to_gif(frames, "freenoise-64.gif")
LoRA refactor
We have significantly refactored the loader classes associated with LoRA. Going forward, this will help in adding LoRA support for new pipelines and models. We now have a LoraBaseMixin
class which is subclassed by the different pipeline-level LoRA loading classes such as StableDiffusionXLLoraLoaderMixin
. This document provides an overview of the available classes.
Additionally, we have increased the coverage of methods within the PeftAdapterMixin
class. This refactoring allows all the supported models to share common LoRA functionalities such set_adapter()
, add_adapter()
, and so on.
To learn more details, please follow this PR. If you see any LoRA-related issues stemming from these refactors, please open an issue.
π¨Β Fixing attention projection fusion
We discovered that the implementation of fuse_qkv_projections()
was broken. This was fixed in this PR. Additionally, this PR added the fusion support to AuraFlow and PixArt Sigma. A reasoning as to where this kind of fusion might be useful is available here.
All commits
- [Release notification] add some info when there is an error. by @sayakpaul in #8718
- Modify FlowMatch Scale Noise by @asomoza in #8678
- Fix json WindowsPath crash by @vincedovy in #8662
- Motion Model / Adapter versatility by @Arlaz in #8301
- [Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
- [Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
- [Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
- fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
- [Community pipeline] SD3 Differential Diffusion Img2Img Pipeline by @asomoza in #8679
- Benchmarking workflow fix by @sayakpaul in #8389
- add PAG support for SD architecture by @shauray8 in #8725
- shift cache in benchmarking. by @sayakpaul in #8740
- [train_controlnet_sdxl.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8476
- fix the LR schedulers for
dreambooth_lora
by @WenheLI in #8510 - [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support by @gnobitab in #8747
- Always raise from previous error by @Wauplin in #8751
- [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart by @yiyixuxu in #8735
- Remove legacy single file model loading mixins by @DN6 in #8754
- Allow from_transformer in SD3ControlNetModel by @haofanwang in #8749
- [SD3 LoRA Training] Fix errors when not training text encoders by @asomoza in #8743
- [Tests] add test suite for SD3 DreamBooth by @sayakpaul in #8650
- [hunyuan-dit] refactor
HunyuanCombinedTimestepTextSizeStyleEmbedding
by @yiyixuxu in #8761 - Enforce ordering when running Pipeline slow tests by @DN6 in #8763
- Fix warning in UNetMotionModel by @DN6 in #8756
- Fix indent in dreambooth lora advanced SD 15 script by @DN6 in #8753
- Fix mistake in Single File Docs page by @DN6 in #8765
- Reflect few contributions on
philosophy.md
that were not reflected on #8294 by @mreraser in #8690 - correct
attention_head_dim
forJointTransformerBlock
by @yiyixuxu in #8608 - [LoRA] introduce
LoraBaseMixin
to promote reusability. by @sayakpaul in #8670 - Revert "[LoRA] introduce
LoraBaseMixin
to promote reusability." by @sayakpaul in #8773 - Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab by @sayakpaul in #8762
- Update README.md to include Colab link by @sayakpaul in #8775
- [Chore] add dummy lora attention processors to prevent failures in other libs by @sayakpaul in #8777
- [advanced dreambooth lora] add clip_skip arg by @linoytsaban in #8715
- [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet by @gnobitab in #8783
- Fix minor bug in SD3 img2img test by @a-r-r-o-w in #8779
- [Tests] fix sharding tests by @sayakpaul in #8764
- Add vae_roundtrip.py example by @thomaseding in #7104
- [Single File] Allow loading T5 encoder in mixed precision by @DN6 in #8778
- Fix saving text encoder weights and kohya weights in advanced dreambooth lora script by @DN6 in #8766
- Improve model card for
push_to_hub
trainers by @apolinario in #8697 - fix loading sharded checkpoints from subfolder by @yiyixuxu in #8798
- [Alpha-VLLM Team] Add Lumina-T2X to diffusers by @PommesPeter in #8652
- Fix static typing and doc typos by @zhuoqun-chen in #8807
- Remove unnecessary lines by @tolgacangoz in #8569
- Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference by @IrohXu in #8709
- [Tests] fix more sharding tests by @sayakpaul in #8797
- Reformat docstring for
get_timestep_embedding
by @alanhdu in #8811 - Latte: Latent Diffusion Transformer for Video Generation by @maxin-cn in #8404
- [Core] Add Kolors by @asomoza in #8812
- [Core] Add AuraFlow by @sayakpaul in #8796
- Add VAE tiling option for SD3 by @DN6 in #8791
- Add single file loading support for AnimateDiff by @DN6 in #8819
- [Docs] add AuraFlow docs by @sayakpaul in #8851
- [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU by @ustcuna in #8643
- add PAG support sd15 controlnet by @tuanh123789 in #8820
- [tests] fix typo in pag tests by @a-r-r-o-w in #8845
- [Docker] include python3.10 dev and solve header missing problem by @sayakpaul in #8865
- [
Cont'd
] Add the SDE variant ofDPM-Solverand DPM-Solver++ to DPM Single Step by @tolgacangoz in #8269 - modify pocs. by @sayakpaul in #8867
- [Core] fix: shard loading and saving when variant is provided. by @sayakpaul in #8869
- [Chore] allow auraflow latest to be torch compile compatible. by @sayakpaul in #8859
- Add AuraFlowPipeline and KolorsPipeline to auto map by @Beinsezii in #8849
- Fix multi-gpu case for
train_cm_ct_unconditional.py
by @tolgacangoz in #8653 - [docs] pipeline docs for latte by @a-r-r-o-w in #8844
- [Chore] add disable forward chunking to SD3 transformer. by @sayakpaul in #8838
- [Core] remove
resume_download
from Hub related stuff by @sayakpaul in #8648 - Add option to SSH into CPU runner. by @DN6 in #8884
- SSH into cpu runner fix by @DN6 in #8888
- SSH into cpu runner additional fix by @DN6 in #8893
- [SDXL] Fix uncaught error with image to image by @asomoza in #8856
- fix loop bug in SlicedAttnProcessor by @shinetzh in #8836
- [fix code annotation] Adjust the dimensions of the rotary positional embedding. by @wangqixun in #8890
- allow tensors in several schedulers step() call by @catwell in #8905
- Use model_info.id instead of model_info.modelId by @Wauplin in #8912
- [Training] SD3 training fixes by @sayakpaul in #8917
- π [i18n-KO] Translated docs to Korean (added 7 docs and etc) by @Snailpong in #8804
- [Docs] small fixes to pag guide. by @sayakpaul in #8920
- Reflect few contributions on
ethical_guidelines.md
that were not reflected on #8294 by @mreraser in #8914 - [Tests] proper skipping of request caching test by @sayakpaul in #8908
- Add attentionless VAE support by @Gothos in #8769
- [Benchmarking] check if runner helps to restore benchmarking by @sayakpaul in #8929
- Update pipeline test fetcher by @DN6 in #8931
- [Tests] reduce the model size in the audioldm2 fast test by @ariG23498 in #7846
- fix: checkpoint save issue in advanced dreambooth lora sdxl script by @akbaig in #8926
- [Tests] Improve transformers model test suite coverage - Temporal Transformer by @rootonchair in #8932
- Fix Colab and Notebook checks for
diffusers-cli env
by @tolgacangoz in #8408 - Fix name when saving text inversion embeddings in dreambooth advanced scripts by @DN6 in #8927
- [Core] fix QKV fusion for attention by @sayakpaul in #8829
- remove residual i from auraflow. by @sayakpaul in #8949
- [CI] Skip flaky download tests in PR CI by @DN6 in #8945
- [AuraFlow] fix long prompt handling by @sayakpaul in #8937
- Added Code for Gradient Accumulation to work for basic_training by @RandomGamingDev in #8961
- [AudioLDM2] Fix cache pos for GPT-2 generation by @sanchit-gandhi in #8964
- [Tests] fix slices of 26 tests (first half) by @sayakpaul in #8959
- [CI] Slow Test Updates by @DN6 in #8870
- [tests] speed up animatediff tests by @a-r-r-o-w in #8846
- [LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8774
- Update TensorRT img2img community pipeline by @asfiyab-nvidia in #8899
- Enable CivitAI SDXL Inpainting Models Conversion by @mazharosama in #8795
- Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @yiyixuxu in #8976
- fix guidance_scale value not equal to the value in comments by @efwfe in #8941
- [Chore] remove all is from auraflow. by @sayakpaul in #8980
- [Chore] add
LoraLoaderMixin
to the inits by @sayakpaul in #8981 - Added
accelerator
based gradient accumulation for basic_example by @RandomGamingDev in #8966 - [CI] Fix parallelism in nightly tests by @DN6 in #8983
- [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix by @DN6 in #8986
- [fix] FreeInit step index out of bounds by @a-r-r-o-w in #8969
- [core] AnimateDiff SparseCtrl by @a-r-r-o-w in #8897
- remove unused code from pag attn procs by @a-r-r-o-w in #8928
- [Kolors] Add IP Adapter by @asomoza in #8901
- [CI] Update runner configuration for setup and nightly tests by @XciD in #9005
- [Docs] credit where it's due for Lumina and Latte. by @sayakpaul in #9000
- handle lora scale and clip skip in lpw sd and sdxl community pipelines by @noskill in #8988
- [LoRA] fix: animate diff lora stuff. by @sayakpaul in #8995
- Stable Audio integration by @ylacombe in #8716
- [core] Move community AnimateDiff ControlNet to core by @a-r-r-o-w in #8972
- Fix Stable Audio repository id by @ylacombe in #9016
- PAG variant for AnimateDiff by @a-r-r-o-w in #8789
- Updates deps for pipeline test fetcher by @DN6 in #9033
- fix load sharded checkpoint from a subfolder (local path) by @yiyixuxu in #8913
- [docs] fix pia example by @a-r-r-o-w in #9015
- Flux pipeline by @sayakpaul in #9043
- [Core] Add PAG support for PixArtSigma by @sayakpaul in #8921
- [Flux] allow tests to run by @sayakpaul in #9050
- Fix Nightly Deps by @DN6 in #9036
- Update transformer_flux.py by @haofanwang in #9060
- Errata: Fix typos &
\s+$
by @tolgacangoz in #9008 - [refactor] create modeling blocks specific to AnimateDiff by @a-r-r-o-w in #8979
- Fix grammar mistake. by @prideout in #9072
- [Flux] minor documentation fixes for flux. by @sayakpaul in #9048
- Update TensorRT txt2img and inpaint community pipelines by @asfiyab-nvidia in #9037
- type
get_attention_scores
as optional inget_attention_scores
by @psychedelicious in #9075 - [refactor] apply qk norm in attention processors by @a-r-r-o-w in #9071
- [FLUX] support LoRA by @sayakpaul in #9057
- [Tests] Improve transformers model test suite coverage - Latte by @rootonchair in #8919
- PAG variant for HunyuanDiT, PAG refactor by @a-r-r-o-w in #8936
- [Docs] add stable cascade unet doc. by @sayakpaul in #9066
- add sentencepiece as a soft dependency by @yiyixuxu in #9065
- Fix typos by @omahs in #9077
- Update
CLIPFeatureExtractor
toCLIPImageProcessor
andDPTFeatureExtractor
toDPTImageProcessor
by @tolgacangoz in #9002 - [Core] add QKV fusion to AuraFlow and PixArt Sigma by @sayakpaul in #8952
- [bug] remove unreachable norm_type=ada_norm_continuous from norm3 initialization conditions by @a-r-r-o-w in #9006
- [Tests] Improve transformers model test suite coverage - Hunyuan DiT by @rootonchair in #8916
- update by @DN6 (direct commit on v0.30.0-release)
- [Docs] Add community projects section to docs by @DN6 in #9013
- add PAG support for Stable Diffusion 3 by @sunovivid in #8861
- Fix loading sharded checkpoints when we have variants by @SunMarc in #9061
- [Single File] Add single file support for Flux Transformer by @DN6 in #9083
- [Kolors] Add PAG by @asomoza in #8934
- fix train_dreambooth_lora_sd3.py loading hook by @sayakpaul in #9107
- [core] FreeNoise by @a-r-r-o-w in #8948
- Flux fp16 inference fix by @latentCall145 in #9097
- [feat] allow sparsectrl to be loaded from single file by @a-r-r-o-w in #9073
- Freenoise change
vae_batch_size
todecode_chunk_size
by @DN6 in #9110 - Add CogVideoX text-to-video generation model by @zRzRzRzRzRzRzR in #9082
- Release: v0.30.0 by @sayakpaul (direct commit on v0.30.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @DN6
- [Tests] Fix precision related issues in slow pipeline tests (#8720)
- Remove legacy single file model loading mixins (#8754)
- Enforce ordering when running Pipeline slow tests (#8763)
- Fix warning in UNetMotionModel (#8756)
- Fix indent in dreambooth lora advanced SD 15 script (#8753)
- Fix mistake in Single File Docs page (#8765)
- [Single File] Allow loading T5 encoder in mixed precision (#8778)
- Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
- Add VAE tiling option for SD3 (#8791)
- Add single file loading support for AnimateDiff (#8819)
- Add option to SSH into CPU runner. (#8884)
- SSH into cpu runner fix (#8888)
- SSH into cpu runner additional fix (#8893)
- Update pipeline test fetcher (#8931)
- Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
- [CI] Skip flaky download tests in PR CI (#8945)
- [CI] Slow Test Updates (#8870)
- [CI] Fix parallelism in nightly tests (#8983)
- [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
- Updates deps for pipeline test fetcher (#9033)
- Fix Nightly Deps (#9036)
- update
- [Docs] Add community projects section to docs (#9013)
- [Single File] Add single file support for Flux Transformer (#9083)
- Freenoise change
vae_batch_size
todecode_chunk_size
(#9110)
- @shauray8
- add PAG support for SD architecture (#8725)
- @gnobitab
- @yiyixuxu
- [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
- [hunyuan-dit] refactor
HunyuanCombinedTimestepTextSizeStyleEmbedding
(#8761) - correct
attention_head_dim
forJointTransformerBlock
(#8608) - fix loading sharded checkpoints from subfolder (#8798)
- Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
- fix load sharded checkpoint from a subfolder (local path) (#8913)
- add sentencepiece as a soft dependency (#9065)
- @PommesPeter
- [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
- @IrohXu
- Add pipeline_stable_diffusion_3_inpaint.py for SD3 Inference (#8709)
- @maxin-cn
- Latte: Latent Diffusion Transformer for Video Generation (#8404)
- @ustcuna
- [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
- @tuanh123789
- add PAG support sd15 controlnet (#8820)
- @Snailpong
- π [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
- @asfiyab-nvidia
- @ylacombe
- @sunovivid
- add PAG support for Stable Diffusion 3 (#8861)
- @zRzRzRzRzRzRzR
- Add CogVideoX text-to-video generation model (#9082)