huggingface/diffusers v0.22.0 on GitHub

Latent Consistency Models (LCM)

LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)

# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4 

images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images

Refer to the documentation to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @luosiallen, @nagolinc, and @dg845.

PixArt-Alpha

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline:

from diffusers import PixArtAlphaPipeline
import torch 

pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS"
pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("sahara.png")

Check out the docs to learn more.

AnimateDiff

AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.

These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a MotionAdapter and a UNetMotionModel. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.

The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
    model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

output = pipe(
    prompt=(
        "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
        "orange sky, warm lighting, fishing boats, ocean waves seagulls, "
        "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
        "golden hour, coastal landscape, seaside scenery"
    ),
    negative_prompt="bad quality, worse quality",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

You can convert an existing 2D UNet into a UNetMotionModel:

from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel

unet = UNetMotionModel()

# Load from an existing 2D UNet and MotionAdapter
unet2D = UNet2DConditionModel.from_pretrained("SG161222/Realistic_Vision_V5.1_noVAE", subfolder="unet")
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

# load motion adapter here
unet_motion = UNetMotionModel.from_unet2d(unet2D, motion_adapter: Optional = None)

# Or load motion modules after init
unet_motion.load_motion_modules(motion_adapter)

# freeze all 2D UNet layers except for the motion modules for finetuning
unet_motion.freeze_unet2d_params()

# Save only motion modules
unet_motion.save_motion_module(<path to save model>, push_to_hub=True)

AnimateDiff also comes with motion LoRA modules, letting you control subtleties:

import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")

scheduler = DDIMScheduler.from_pretrained(
    model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

output = pipe(
    prompt=(
        "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
        "orange sky, warm lighting, fishing boats, ocean waves seagulls, "
        "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
        "golden hour, coastal landscape, seaside scenery"
    ),
    negative_prompt="bad quality, worse quality",
    num_frames=16,
    guidance_scale=7.5,
    num_inference_steps=25,
    generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")

Check out the documentation to learn more.

PEFT 🤝 Diffusers

There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 PEFT integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.

Here is an example of combining multiple LoRAs using this new integration:

from diffusers import DiffusionPipeline
import torch

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

# Load LoRA 1.
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
# Load LoRA 2.
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")

# Combine the adapters.
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])

# Perform inference.
prompt = "toy_face of a hacker with a hoodie, pixel art"
image = pipe(
    prompt, num_inference_steps=30, cross_attention_kwargs={"scale": 1.0}, generator=torch.manual_seed(0)
).images[0]
image

Refer to the documentation to learn more.

Community components with community pipelines

We have had support for community pipelines for a while now. This enables fast integration for pipelines we cannot directly integrate within the core codebase of the library. However, community pipelines always rely on the building blocks from Diffusers, which can be restrictive for advanced use cases.

To elevate this, we’re elevating community pipelines with community components starting this release 🤗 By specifying trust_remote_code=True and writing the pipeline repository in a specific way, users can customize their pipeline and component code as flexibly as possible:

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "<change-username>/<change-id>", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")

prompt = "hello"

# Text embeds
prompt_embeds, negative_embeds = pipeline.encode_prompt(prompt)

# Keyframes generation (8x64x40, 2fps)
video_frames = pipeline(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    num_frames=8,
    height=40,
    width=64,
    num_inference_steps=2,
    guidance_scale=9.0,
    output_type="pt"
).frames

Refer to the documentation to learn more.

Dynamic callbacks

Most 🤗 Diffusers pipelines now accept a callback_on_step_end argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.

def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):    
    # adjust the batch_size of prompt_embeds according to guidance_scale
    if step_index == int(pipe.num_timestep * 0.4):
        prompt_embeds = callback_kwargs["prompt_embeds"]
        prompt_embeds =prompt_embeds.chunk(2)[-1]
    
    # update guidance_scale and prompt_embeds
    pipe._guidance_scale = 0.0
    callback_kwargs["prompt_embeds"] = prompt_embeds
    return callback_kwargs

Here’s how you can use it:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"

generator = torch.Generator(device="cuda").manual_seed(1)
out= pipe(prompt, generator=generator, callback_on_step_end=callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])

out.images[0].save("out_custom_cfg.png")

Check out the docs to learn more.

All commits

[PEFT / LoRA ] Fix text encoder scaling by @younesbelkada in #5204
Fix doc KO unconditional_image_generation.md by @mishig25 in #5236
Flax: Ignore PyTorch, ONNX files when they coexist with Flax weights by @pcuenca in #5237
Fixed constants.py not using hugging face hub environment variable by @Zanz2 in #5222
Compile test fixes by @DN6 in #5235
[PEFT warnings] Only sure deprecation warnings in the future by @patrickvonplaten in #5240
Add docstrings in forward methods of adapter model by @Nandika-A in #5253
make style by @patrickvonplaten (direct commit on main)
[WIP] Refactor UniDiffuser Pipeline and Tests by @dg845 in #4948
fix: how print training resume logs. by @sayakpaul in #5117
Add docstring for the AutoencoderKL's decode by @freespirit in #5242
Add a docstring for the AutoencoderKL's encode by @freespirit in #5239
Update UniPC to support 1D diffusion. by @leng-yue in #5199
[Schedulers] Fix callback steps by @patrickvonplaten in #5261
make fix copies by @patrickvonplaten (direct commit on main)
[Research folder] Add SDXL example by @patrickvonplaten in #5275
Fix UniPC scheduler for 1D by @patrickvonplaten in #5276
New Pipeline Slow Test runners by @DN6 in #5131
handle case when controlnet is list or tuple by @noskill in #5179
make style by @patrickvonplaten (direct commit on main)
Zh doc by @WADreaming in #4807
✨ [Core] Add FreeU mechanism by @kadirnar in #5164
pin torch version by @DN6 in #5297
add: entry for DDPO support. by @sayakpaul in #5250
Min-SNR Gamma: correct the fix for SNR weighted loss in v-prediction … by @bghira in #5238
Update bug-report.yml by @patrickvonplaten (direct commit on main)
Bump tolerance on shape test by @DN6 in #5289
Add from single file to StableDiffusionUpscalePipeline and StableDiffusionLatentUpscalePipeline by @DN6 in #5194
[LoRA] fix: torch.compile() for lora conv by @sayakpaul in #5298
[docs] Improved inpaint docs by @stevhliu in #5210
Minor fixes by @TimothyAlexisVass in #5309
[Hacktoberfest]Fixing issues #5241 by @jgyfutub in #5255
Update README.md by @ShubhamJagtap2000 in #5267
fix typo in train dreambooth lora description by @themez in #5332
Fix [core/GLIGEN]: TypeError when iterating over 0-d tensor with In-painting mode when EulerAncestralDiscreteScheduler is used by @rchuzh99 in #5305
fix inference in custom diffusion by @caopulan in #5329
Improve performance of fast test by reducing down blocks by @sepal in #5290
make-fast-test-for-StableDiffusionControlNetPipeline-faster by @m0saan in #5292
Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5299
Add py.typed for PEP 561 compliance by @byarbrough in #5326
[HacktoberFest] Add missing docstrings to diffusers/models by @a-r-r-o-w in #5248
make style by @patrickvonplaten (direct commit on main)
Fix links in docs to adapter code by @johnowhitaker in #5323
replace references to deprecated KeyArray & PRNGKeyArray by @jakevdp in #5324
Fix loading broken LoRAs that could give NaN by @patrickvonplaten in #5316
[JAX] Replace uses of jnp.array in types with jnp.ndarray. by @hvaara in #4719
Add missing dependency in requirements file by @juliensimon in #5345
fix problem of 'accelerator.is_main_process' to run in mutiple GPUs by @jiaqiw09 in #5340
[docs] Create a mask for inpainting by @stevhliu in #5322
Adding PyTorch XLA support for sdxl inference by @ssusie in #5273
[Examples] use loralinear instead of depecrecated lora attn procs. by @sayakpaul in #5331
Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5312
Fix StableDiffusionXLImg2ImgPipeline creation in sdxl tutorial by @soumik12345 in #5367
I Added Doc-String Into The class. by @hi-sushanta in #5293
make style by @patrickvonplaten (direct commit on main)
[docs] Minor fixes by @stevhliu in #5369
New xformers test runner by @DN6 in #5349
[Core] Add FreeU to all the core pipelines and their (mostly-used) derivatives by @sayakpaul in #5376
[core / PEFT / LoRA] Integrate PEFT into Unet by @younesbelkada in #5151
[Bot] FIX stale.py uses timezone-aware datetime by @sayakpaul in #5396
[Examples] fix unconditioning generation training example for mixed-precision training by @sayakpaul in #5407
[Wuerstchen] text to image training script by @kashif in #5052
[Docs] add docs on peft diffusers integration by @sayakpaul in #5359
chore: fix typos by @afuetterer in #5386
[Examples] Update with HFApi by @sayakpaul in #5393
Add ability to mix usage of T2I-Adapter(s) and ControlNet(s). by @GreggHelt2 in #5362
make style by @patrickvonplaten (direct commit on main)
[Core] Fix/pipeline without text encoders for SDXL by @sayakpaul in #5301
[Examples] Follow up of #5393 by @sayakpaul in #5420
changed channel parameters for UNET and VAE. Changed configs parameters of CLIPText by @aeros29 in #5370
Chore: Typo fixed in multiple files by @SusheelThapa in #5422
Update base image for slow CUDA tests by @DN6 in #5426
Fix pipe fetcher for slow tests by @DN6 in #5424
make fix copies by @patrickvonplaten (direct commit on main)
Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
[from_single_file()]fix: local single file loading. by @sayakpaul in #5440
Add latent consistency by @patrickvonplaten in #5438
Update-DeepFloyd-IF-Pipelines-Docstrings by @m0saan in #5304
style(sdxl): remove identity assignments by @liang-hou in #5418
Fix the order of width and height of original size in SDXL training script by @linjiapro in #5382
make style by @patrickvonplaten (direct commit on main)
Beautiful Doc string added into the UNetMidBlock2D class. by @hi-sushanta in #5389
make style by @patrickvonplaten (direct commit on main)
fix une2td ignoring class_labels by @kesimeg in #5401
Added support to create asymmetrical U-Net structures by @Gothos in #5400
[PEFT] Fix scale unscale with LoRA adapters by @younesbelkada in #5417
Make T2I-Adapter downscale padding match the UNet by @RyanJDick in #5435
Update README.md by @anvilarth in #5497
fixed SDXL text encoder training bug #5016 by @shyammarjit in #5078
make style by @patrickvonplaten (direct commit on main)
[torch.compile] fix graph break problems partially by @sayakpaul in #5453
Fix Slow Tests by @DN6 in #5469
Fix typo in controlnet docs by @MrSyee in #5486
[BUG] in transformer_temporal Fix Bugs by @zideliu in #5496
[docs] Fix links by @stevhliu in #5499
fix a few issues in controlnet inpaint pipelines by @yiyixuxu in #5470
Fixed autoencoder typo by @abhisharsinha in #5500
[Core] Refactor activation and normalization layers by @sayakpaul in #5493
Register BaseOutput subclasses as supported torch.utils._pytree nodes by @BowenBao in #5459
Japanese docs by @isamu-isozaki in #5478
[docs] General updates by @stevhliu in #5378
Add Latent Consistency Models Pipeline by @dg845 in #5448
fix typo by @mymusise in #5505
fix error of peft lora when xformers enabled by @AnyISalIn in #5506
fix a bug in 2nd order schedulers when using in ensemble of experts config by @yiyixuxu in #5511
[Schedulers] Fix 2nd order other than heun by @patrickvonplaten in #5526
Add a new community pipeline by @nagolinc in #5477
make style by @patrickvonplaten (direct commit on main)
Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5391
make fix-copies by @patrickvonplaten (direct commit on main)
Fix missing punctuation in PHILOSOPHY.md by @RampagingSloth in #5530
fix a bug on torch_dtype argument in from_single_file of ControlNetModel by @xuyxu in #5528
[docs] Loader docs by @stevhliu in #5473
Add from_pt flag to enable model from PT by @RissyRan in #5501
Remove multiple if-else statement in the get_activation function. by @hi-sushanta in #5446
[Tests] Speed up expert of mixture tests by @patrickvonplaten in #5533
[Tests] Optimize test configurations for faster execution by @p1kit in #5535
[Remote code] Add functionality to run remote models, schedulers, pipelines by @patrickvonplaten in #5472
Update train_dreambooth.py - fix typos by @nickkolok in #5539
correct checkpoint in kandinsky2.2 doc page by @yiyixuxu in #5550
[Core] fix FreeU disable method by @sayakpaul in #5552
[docs] Internal classes API by @stevhliu in #5513
fix error reported 'find_unused_parameters' running in mutiple GPUs by @jiaqiw09 in #5355
docs: initial pt translation by @SirMonteiro in #5549
Fix moved _expand_mask function by @patrickvonplaten in #5581
[PEFT / Tests] Add peft slow tests on push by @younesbelkada in #5419
Add realfill by @thuanz123 in #5456
add fix to be able use StableDiffusionXLAdapterPipeline.from_single_file by @pshtif in #5547
Stabilize DPM++, especially for SDXL and SDE-DPM++ by @LuChengTHU in #5541
Fix incorrect loading of custom pipeline by @a-r-r-o-w in #5568
[core / PEFT ]Bump transformers min version for PEFT integration by @younesbelkada in #5579
Fix divide by zero RuntimeWarning by @TimothyAlexisVass in #5543
[Community Pipelines] add textual inversion support for stable_diffusion_ipex by @miaojinc in #5571
fix a mistake in text2image training script for kandinsky2.2 by @yiyixuxu in #5244
Update docker image for xformers by @DN6 in #5597
[Docs] Fix typos by @standardAI in #5583
[Docs] Fix typos, improve, update at Tutorials page by @standardAI in #5586
[docs] Lu lambdas by @stevhliu in #5602
Update final CPU offloading code for more diffusion pipelines by @clarencechen in #5589
[Core] enable lora for sdxl adapters too and add slow tests. by @ilisparrow in #5555
fix by @patrickvonplaten (direct commit on main)
Remove Redundant Variables from Encoder and Decoder by @hi-sushanta in #5569
Revert "Fix the order of width and height of original size in SDXL training script" by @patrickvonplaten in #5614
[PEFT / LoRA] Fix civitai bug when network alpha is an empty dict by @younesbelkada in #5608
[Docs] Fix typos, improve, update at Get Started page by @standardAI in #5587
[SDXL Adapter] Revert load lora by @patrickvonplaten in #5615
[docs] Kandinsky guide by @stevhliu in #4555
[remote code] document trust remote code. by @sayakpaul in #5620
[Tests] Fix cpu offload test by @patrickvonplaten in #5626
[Docs] Fix typos, improve, update at Conceptual Guides page by @standardAI in #5585
Animatediff Proposal by @DN6 in #5413
[Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page by @standardAI in #5584
[LCM] Make sure img2img works by @patrickvonplaten in #5632
Update animatediff docs to include section on Motion LoRAs by @DN6 in #5639
[Easy] Minor AnimateDiff Doc nits by @sayakpaul in #5640
fix a bug in AutoPipeline.from_pipe() when creating a controlnet pipeline from an existing controlnet by @yiyixuxu in #5638
[Easy] clean up the LCM docstrings. by @sayakpaul in #5637
Model loading speed optimization by @RyanJDick in #5635
Clean up LCM Pipeline and Test Code. by @dg845 in #5641
[Docs] Fix typos, improve, update at Using Diffusers' Tecniques page by @standardAI in #5627
[Core] support for tiny autoencoder in img2img by @sayakpaul in #5636
Remove the redundant line from the adapter.py file. by @hi-sushanta in #5618
add callbacks to denoising step by @yiyixuxu in #5427
[Feat] PixArt-Alpha by @sayakpaul in #5642
correct pipeline class name by @sayakpaul in #5652

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@dg845
- [WIP] Refactor UniDiffuser Pipeline and Tests (#4948)
- Add Latent Consistency Models Pipeline (#5448)
- Clean up LCM Pipeline and Test Code. (#5641)
@kadirnar
- ✨ [Core] Add FreeU mechanism (#5164)
@a-r-r-o-w
- Improve typehints and docs in diffusers/models (#5299)
- [HacktoberFest] Add missing docstrings to diffusers/models (#5248)
- Improve typehints and docs in diffusers/models (#5312)
- Improve typehints and docs in diffusers/models (#5391)
- Fix incorrect loading of custom pipeline (#5568)
@isamu-isozaki
- Japanese docs (#5478)
@nagolinc
- Add a new community pipeline (#5477)
@SirMonteiro
- docs: initial pt translation (#5549)
@thuanz123
- Add realfill (#5456)
@standardAI
- [Docs] Fix typos (#5583)
- [Docs] Fix typos, improve, update at Tutorials page (#5586)
- [Docs] Fix typos, improve, update at Get Started page (#5587)
- [Docs] Fix typos, improve, update at Conceptual Guides page (#5585)
- [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page (#5584)
- [Docs] Fix typos, improve, update at Using Diffusers' Tecniques page (#5627)

huggingface/diffusers v0.22.0 v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more on GitHub

Latent Consistency Models (LCM)

PixArt-Alpha

AnimateDiff

PEFT 🤝 Diffusers

Community components with community pipelines

Dynamic callbacks

All commits

Significant community contributions

huggingface/diffusers v0.22.0
v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more

on GitHub