Latent Consistency Models (LCM)
LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", torch_dtype=torch.float32)
# To save GPU memory, torch.float16 can be used, but it may compromise image quality.
pipe.to(torch_device="cuda", torch_dtype=torch.float32)
prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"
# Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.
num_inference_steps = 4
images = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=8.0).images
Refer to the documentation to learn more.
LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @luosiallen, @nagolinc, and @dg845.
PixArt-Alpha
PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.
It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.
Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline
:
from diffusers import PixArtAlphaPipeline
import torch
pipeline_id = "PixArt-alpha/PixArt-XL-2-1024-MS"
pipeline = PixArtAlphaPipeline.from_pretrained(pipeline_id, torch_dtype=torch.float16)
pipeline.enable_model_cpu_offload()
prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]
image.save("sahara.png")
Check out the docs to learn more.
AnimateDiff
AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.
These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a MotionAdapter
and a UNetMotionModel
. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.
The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
You can convert an existing 2D UNet into a UNetMotionModel
:
from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel
unet = UNetMotionModel()
# Load from an existing 2D UNet and MotionAdapter
unet2D = UNet2DConditionModel.from_pretrained("SG161222/Realistic_Vision_V5.1_noVAE", subfolder="unet")
motion_adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load motion adapter here
unet_motion = UNetMotionModel.from_unet2d(unet2D, motion_adapter: Optional = None)
# Or load motion modules after init
unet_motion.load_motion_modules(motion_adapter)
# freeze all 2D UNet layers except for the motion modules for finetuning
unet_motion.freeze_unet2d_params()
# Save only motion modules
unet_motion.save_motion_module(<path to save model>, push_to_hub=True)
AnimateDiff also comes with motion LoRA modules, letting you control subtleties:
import torch
from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler
from diffusers.utils import export_to_gif
# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffPipeline.from_pretrained(model_id, motion_adapter=adapter)
pipe.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
scheduler = DDIMScheduler.from_pretrained(
model_id, subfolder="scheduler", clip_sample=False, timestep_spacing="linspace", steps_offset=1
)
pipe.scheduler = scheduler
# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()
output = pipe(
prompt=(
"masterpiece, bestquality, highlydetailed, ultradetailed, sunset, "
"orange sky, warm lighting, fishing boats, ocean waves seagulls, "
"rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, "
"golden hour, coastal landscape, seaside scenery"
),
negative_prompt="bad quality, worse quality",
num_frames=16,
guidance_scale=7.5,
num_inference_steps=25,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
Check out the documentation to learn more.
PEFT 🤝 Diffusers
There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 PEFT integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.
Here is an example of combining multiple LoRAs using this new integration:
from diffusers import DiffusionPipeline
import torch
pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")
# Load LoRA 1.
pipe.load_lora_weights("CiroN2022/toy-face", weight_name="toy_face_sdxl.safetensors", adapter_name="toy")
# Load LoRA 2.
pipe.load_lora_weights("nerijs/pixel-art-xl", weight_name="pixel-art-xl.safetensors", adapter_name="pixel")
# Combine the adapters.
pipe.set_adapters(["pixel", "toy"], adapter_weights=[0.5, 1.0])
# Perform inference.
prompt = "toy_face of a hacker with a hoodie, pixel art"
image = pipe(
prompt, num_inference_steps=30, cross_attention_kwargs={"scale": 1.0}, generator=torch.manual_seed(0)
).images[0]
image
Refer to the documentation to learn more.
Community components with community pipelines
We have had support for community pipelines for a while now. This enables fast integration for pipelines we cannot directly integrate within the core codebase of the library. However, community pipelines always rely on the building blocks from Diffusers, which can be restrictive for advanced use cases.
To elevate this, we’re elevating community pipelines with community components starting this release 🤗 By specifying trust_remote_code=True
and writing the pipeline repository in a specific way, users can customize their pipeline and component code as flexibly as possible:
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"<change-username>/<change-id>", trust_remote_code=True, torch_dtype=torch.float16
).to("cuda")
prompt = "hello"
# Text embeds
prompt_embeds, negative_embeds = pipeline.encode_prompt(prompt)
# Keyframes generation (8x64x40, 2fps)
video_frames = pipeline(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_embeds,
num_frames=8,
height=40,
width=64,
num_inference_steps=2,
guidance_scale=9.0,
output_type="pt"
).frames
Refer to the documentation to learn more.
Dynamic callbacks
Most 🤗 Diffusers pipelines now accept a callback_on_step_end
argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.
def callback_dynamic_cfg(pipe, step_index, timestep, callback_kwargs):
# adjust the batch_size of prompt_embeds according to guidance_scale
if step_index == int(pipe.num_timestep * 0.4):
prompt_embeds = callback_kwargs["prompt_embeds"]
prompt_embeds =prompt_embeds.chunk(2)[-1]
# update guidance_scale and prompt_embeds
pipe._guidance_scale = 0.0
callback_kwargs["prompt_embeds"] = prompt_embeds
return callback_kwargs
Here’s how you can use it:
import torch
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(1)
out= pipe(prompt, generator=generator, callback_on_step_end=callback_custom_cfg, callback_on_step_end_tensor_inputs=['prompt_embeds'])
out.images[0].save("out_custom_cfg.png")
Check out the docs to learn more.
All commits
- [
PEFT
/LoRA
] Fix text encoder scaling by @younesbelkada in #5204 - Fix doc KO unconditional_image_generation.md by @mishig25 in #5236
- Flax: Ignore PyTorch, ONNX files when they coexist with Flax weights by @pcuenca in #5237
- Fixed constants.py not using hugging face hub environment variable by @Zanz2 in #5222
- Compile test fixes by @DN6 in #5235
- [PEFT warnings] Only sure deprecation warnings in the future by @patrickvonplaten in #5240
- Add docstrings in forward methods of adapter model by @Nandika-A in #5253
- make style by @patrickvonplaten (direct commit on main)
- [WIP] Refactor UniDiffuser Pipeline and Tests by @dg845 in #4948
- fix: how print training resume logs. by @sayakpaul in #5117
- Add docstring for the AutoencoderKL's decode by @freespirit in #5242
- Add a docstring for the AutoencoderKL's encode by @freespirit in #5239
- Update UniPC to support 1D diffusion. by @leng-yue in #5199
- [Schedulers] Fix callback steps by @patrickvonplaten in #5261
- make fix copies by @patrickvonplaten (direct commit on main)
- [Research folder] Add SDXL example by @patrickvonplaten in #5275
- Fix UniPC scheduler for 1D by @patrickvonplaten in #5276
- New Pipeline Slow Test runners by @DN6 in #5131
- handle case when controlnet is list or tuple by @noskill in #5179
- make style by @patrickvonplaten (direct commit on main)
- Zh doc by @WADreaming in #4807
- ✨ [Core] Add FreeU mechanism by @kadirnar in #5164
- pin torch version by @DN6 in #5297
- add: entry for DDPO support. by @sayakpaul in #5250
- Min-SNR Gamma: correct the fix for SNR weighted loss in v-prediction … by @bghira in #5238
- Update bug-report.yml by @patrickvonplaten (direct commit on main)
- Bump tolerance on shape test by @DN6 in #5289
- Add from single file to StableDiffusionUpscalePipeline and StableDiffusionLatentUpscalePipeline by @DN6 in #5194
- [LoRA] fix: torch.compile() for lora conv by @sayakpaul in #5298
- [docs] Improved inpaint docs by @stevhliu in #5210
- Minor fixes by @TimothyAlexisVass in #5309
- [Hacktoberfest]Fixing issues #5241 by @jgyfutub in #5255
- Update README.md by @ShubhamJagtap2000 in #5267
- fix typo in train dreambooth lora description by @themez in #5332
- Fix [core/GLIGEN]: TypeError when iterating over 0-d tensor with In-painting mode when EulerAncestralDiscreteScheduler is used by @rchuzh99 in #5305
- fix inference in custom diffusion by @caopulan in #5329
- Improve performance of fast test by reducing down blocks by @sepal in #5290
- make-fast-test-for-StableDiffusionControlNetPipeline-faster by @m0saan in #5292
- Improve typehints and docs in
diffusers/models
by @a-r-r-o-w in #5299 - Add py.typed for PEP 561 compliance by @byarbrough in #5326
- [HacktoberFest] Add missing docstrings to diffusers/models by @a-r-r-o-w in #5248
- make style by @patrickvonplaten (direct commit on main)
- Fix links in docs to adapter code by @johnowhitaker in #5323
- replace references to deprecated KeyArray & PRNGKeyArray by @jakevdp in #5324
- Fix loading broken LoRAs that could give NaN by @patrickvonplaten in #5316
- [JAX] Replace uses of
jnp.array
in types withjnp.ndarray
. by @hvaara in #4719 - Add missing dependency in requirements file by @juliensimon in #5345
- fix problem of 'accelerator.is_main_process' to run in mutiple GPUs by @jiaqiw09 in #5340
- [docs] Create a mask for inpainting by @stevhliu in #5322
- Adding PyTorch XLA support for sdxl inference by @ssusie in #5273
- [Examples] use loralinear instead of depecrecated lora attn procs. by @sayakpaul in #5331
- Improve typehints and docs in
diffusers/models
by @a-r-r-o-w in #5312 - Fix
StableDiffusionXLImg2ImgPipeline
creation in sdxl tutorial by @soumik12345 in #5367 - I Added Doc-String Into The class. by @hi-sushanta in #5293
- make style by @patrickvonplaten (direct commit on main)
- [docs] Minor fixes by @stevhliu in #5369
- New xformers test runner by @DN6 in #5349
- [Core] Add FreeU to all the core pipelines and their (mostly-used) derivatives by @sayakpaul in #5376
- [
core
/PEFT
/LoRA
] Integrate PEFT into Unet by @younesbelkada in #5151 - [Bot] FIX stale.py uses timezone-aware datetime by @sayakpaul in #5396
- [Examples] fix unconditioning generation training example for mixed-precision training by @sayakpaul in #5407
- [Wuerstchen] text to image training script by @kashif in #5052
- [Docs] add docs on peft diffusers integration by @sayakpaul in #5359
- chore: fix typos by @afuetterer in #5386
- [Examples] Update with HFApi by @sayakpaul in #5393
- Add ability to mix usage of T2I-Adapter(s) and ControlNet(s). by @GreggHelt2 in #5362
- make style by @patrickvonplaten (direct commit on main)
- [Core] Fix/pipeline without text encoders for SDXL by @sayakpaul in #5301
- [Examples] Follow up of #5393 by @sayakpaul in #5420
- changed channel parameters for UNET and VAE. Changed configs parameters of CLIPText by @aeros29 in #5370
- Chore: Typo fixed in multiple files by @SusheelThapa in #5422
- Update base image for slow CUDA tests by @DN6 in #5426
- Fix pipe fetcher for slow tests by @DN6 in #5424
- make fix copies by @patrickvonplaten (direct commit on main)
- Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
- [
from_single_file()
]fix: local single file loading. by @sayakpaul in #5440 - Add latent consistency by @patrickvonplaten in #5438
- Update-DeepFloyd-IF-Pipelines-Docstrings by @m0saan in #5304
- style(sdxl): remove identity assignments by @liang-hou in #5418
- Fix the order of width and height of original size in SDXL training script by @linjiapro in #5382
- make style by @patrickvonplaten (direct commit on main)
- Beautiful Doc string added into the UNetMidBlock2D class. by @hi-sushanta in #5389
- make style by @patrickvonplaten (direct commit on main)
- fix une2td ignoring class_labels by @kesimeg in #5401
- Added support to create asymmetrical U-Net structures by @Gothos in #5400
- [
PEFT
] Fix scale unscale with LoRA adapters by @younesbelkada in #5417 - Make T2I-Adapter downscale padding match the UNet by @RyanJDick in #5435
- Update README.md by @anvilarth in #5497
- fixed SDXL text encoder training bug #5016 by @shyammarjit in #5078
- make style by @patrickvonplaten (direct commit on main)
- [torch.compile] fix graph break problems partially by @sayakpaul in #5453
- Fix Slow Tests by @DN6 in #5469
- Fix typo in controlnet docs by @MrSyee in #5486
- [BUG] in transformer_temporal Fix Bugs by @zideliu in #5496
- [docs] Fix links by @stevhliu in #5499
- fix a few issues in controlnet inpaint pipelines by @yiyixuxu in #5470
- Fixed autoencoder typo by @abhisharsinha in #5500
- [Core] Refactor activation and normalization layers by @sayakpaul in #5493
- Register BaseOutput subclasses as supported torch.utils._pytree nodes by @BowenBao in #5459
- Japanese docs by @isamu-isozaki in #5478
- [docs] General updates by @stevhliu in #5378
- Add Latent Consistency Models Pipeline by @dg845 in #5448
- fix typo by @mymusise in #5505
- fix error of peft lora when xformers enabled by @AnyISalIn in #5506
- fix a bug in 2nd order schedulers when using in ensemble of experts config by @yiyixuxu in #5511
- [Schedulers] Fix 2nd order other than heun by @patrickvonplaten in #5526
- Add a new community pipeline by @nagolinc in #5477
- make style by @patrickvonplaten (direct commit on main)
- Improve typehints and docs in
diffusers/models
by @a-r-r-o-w in #5391 - make fix-copies by @patrickvonplaten (direct commit on main)
- Fix missing punctuation in PHILOSOPHY.md by @RampagingSloth in #5530
- fix a bug on
torch_dtype
argument infrom_single_file
of ControlNetModel by @xuyxu in #5528 - [docs] Loader docs by @stevhliu in #5473
- Add from_pt flag to enable model from PT by @RissyRan in #5501
- Remove multiple if-else statement in the get_activation function. by @hi-sushanta in #5446
- [Tests] Speed up expert of mixture tests by @patrickvonplaten in #5533
- [Tests] Optimize test configurations for faster execution by @p1kit in #5535
- [Remote code] Add functionality to run remote models, schedulers, pipelines by @patrickvonplaten in #5472
- Update train_dreambooth.py - fix typos by @nickkolok in #5539
- correct checkpoint in kandinsky2.2 doc page by @yiyixuxu in #5550
- [Core] fix FreeU disable method by @sayakpaul in #5552
- [docs] Internal classes API by @stevhliu in #5513
- fix error reported 'find_unused_parameters' running in mutiple GPUs by @jiaqiw09 in #5355
- docs: initial pt translation by @SirMonteiro in #5549
- Fix moved _expand_mask function by @patrickvonplaten in #5581
- [
PEFT
/Tests
] Add peft slow tests on push by @younesbelkada in #5419 - Add realfill by @thuanz123 in #5456
- add fix to be able use StableDiffusionXLAdapterPipeline.from_single_file by @pshtif in #5547
- Stabilize DPM++, especially for SDXL and SDE-DPM++ by @LuChengTHU in #5541
- Fix incorrect loading of custom pipeline by @a-r-r-o-w in #5568
- [
core
/PEFT
]Bump transformers min version for PEFT integration by @younesbelkada in #5579 - Fix divide by zero RuntimeWarning by @TimothyAlexisVass in #5543
- [Community Pipelines] add textual inversion support for stable_diffusion_ipex by @miaojinc in #5571
- fix a mistake in text2image training script for kandinsky2.2 by @yiyixuxu in #5244
- Update docker image for xformers by @DN6 in #5597
- [Docs] Fix typos by @standardAI in #5583
- [Docs] Fix typos, improve, update at Tutorials page by @standardAI in #5586
- [docs] Lu lambdas by @stevhliu in #5602
- Update final CPU offloading code for more diffusion pipelines by @clarencechen in #5589
- [Core] enable lora for sdxl adapters too and add slow tests. by @ilisparrow in #5555
- fix by @patrickvonplaten (direct commit on main)
- Remove Redundant Variables from Encoder and Decoder by @hi-sushanta in #5569
- Revert "Fix the order of width and height of original size in SDXL training script" by @patrickvonplaten in #5614
- [
PEFT
/LoRA
] Fix civitai bug when network alpha is an empty dict by @younesbelkada in #5608 - [Docs] Fix typos, improve, update at Get Started page by @standardAI in #5587
- [SDXL Adapter] Revert load lora by @patrickvonplaten in #5615
- [docs] Kandinsky guide by @stevhliu in #4555
- [remote code] document trust remote code. by @sayakpaul in #5620
- [Tests] Fix cpu offload test by @patrickvonplaten in #5626
- [Docs] Fix typos, improve, update at Conceptual Guides page by @standardAI in #5585
- Animatediff Proposal by @DN6 in #5413
- [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page by @standardAI in #5584
- [LCM] Make sure img2img works by @patrickvonplaten in #5632
- Update animatediff docs to include section on Motion LoRAs by @DN6 in #5639
- [Easy] Minor AnimateDiff Doc nits by @sayakpaul in #5640
- fix a bug in
AutoPipeline.from_pipe()
when creating a controlnet pipeline from an existing controlnet by @yiyixuxu in #5638 - [Easy] clean up the LCM docstrings. by @sayakpaul in #5637
- Model loading speed optimization by @RyanJDick in #5635
- Clean up LCM Pipeline and Test Code. by @dg845 in #5641
- [
Docs
] Fix typos, improve, update at Using Diffusers' Tecniques page by @standardAI in #5627 - [Core] support for tiny autoencoder in img2img by @sayakpaul in #5636
- Remove the redundant line from the adapter.py file. by @hi-sushanta in #5618
- add callbacks to denoising step by @yiyixuxu in #5427
- [Feat] PixArt-Alpha by @sayakpaul in #5642
- correct pipeline class name by @sayakpaul in #5652
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @dg845
- @kadirnar
- ✨ [Core] Add FreeU mechanism (#5164)
- @a-r-r-o-w
- @isamu-isozaki
- Japanese docs (#5478)
- @nagolinc
- Add a new community pipeline (#5477)
- @SirMonteiro
- docs: initial pt translation (#5549)
- @thuanz123
- Add realfill (#5456)
- @standardAI
- [Docs] Fix typos (#5583)
- [Docs] Fix typos, improve, update at Tutorials page (#5586)
- [Docs] Fix typos, improve, update at Get Started page (#5587)
- [Docs] Fix typos, improve, update at Conceptual Guides page (#5585)
- [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page (#5584)
- [
Docs
] Fix typos, improve, update at Using Diffusers' Tecniques page (#5627)