❤️ PyTorch + Accelerate
⚠️ The PyTorch pipelines now require accelerate
for improved model loading times!
Install Diffusers with pip install --upgrade diffusers[torch]
to get everything in a single command.
🍎 Apple Silicon support with PyTorch 1.13
PyTorch and Apple have been working on improving mps
support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!
Requirements
- Mac computer with Apple silicon (M1/M2) hardware.
- macOS 12.6 or later (13.0 or later recommended, as support is even better).
- arm64 version of Python.
- PyTorch 1.13.0 official release, installed from pip or the conda channels.
Memory efficient generation
Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe = pipe.to("mps")
# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()
prompt = "a photo of an astronaut riding a horse on mars"
# First-time "warmup" pass
_ = pipe(prompt, num_inference_steps=1)
image = pipe(prompt).images[0]
image.save("astronaut.png")
Continuous Integration
Our automated tests now include a full battery of tests on the mps
device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.
See more details in the documentation.
💃 Dance Diffusion
diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!
- [Dance Diffusion] Add dance diffusion by @patrickvonplaten #803
Try it out to generate some random music:
from diffusers import DiffusionPipeline
import scipy
model_id = "harmonai/maestro-150k"
pipeline = DiffusionPipeline.from_pretrained(model_id)
pipeline = pipeline.to("cuda")
audio = pipeline(audio_length_in_s=4.0).audios[0]
# To save locally
scipy.io.wavfile.write("maestro_test.wav", pipe.unet.sample_rate, audio.transpose())
🎉 Euler schedulers
These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
euler_scheduler = EulerDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
euler_ancestral_scheduler = EulerAncestralDiscreteScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", scheduler=euler_scheduler, revision="fp16", torch_dtype=torch.float16
)
pipeline.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipeline(prompt, num_inference_steps=25).images[0]
🔥 Up to 2x faster inference with memory_efficient_attention
Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers
- Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532
To leverage it just make sure you have:
- PyTorch > 1.12
- Cuda available
- Installed the xformers library
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
revision="fp16",
torch_dtype=torch.float16,
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()
with torch.inference_mode():
sample = pipe("a small cat")
# optional: You can disable it via
# pipe.disable_xformers_memory_efficient_attention()
🚀 Much faster loading
Thanks to accelerate
, pipeline loading is much, much faster. There are two parts to it:
- First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using
low_cpu_mem_usage
(enabled by default), no initialization will be performed. - Optionally, you can also use
device_map="auto"
to automatically select the best device(s) where the pre-trained weights will be initially sent to.
In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.
As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.
This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.
🎨 RePaint
RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.
from diffusers import RePaintPipeline, RePaintScheduler
# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
scheduler = RePaintScheduler.from_config("google/ddpm-ema-celebahq-256")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)
pipe = pipe.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
output = pipe(
original_image=original_image,
mask_image=mask_image,
num_inference_steps=250,
eta=0.0,
jump_length=10,
jump_n_sample=10,
generator=generator,
)
inpainted_image = output.images[0]
🌍 Community Pipelines
Long Prompt Weighting Stable Diffusion
The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class.
For a code example, see Long Prompt Weighting Stable Diffusion
Speech to Image
Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion.
For a code example, see Speech to Image
- [Examples] add speech to image pipeline example by @MikailINTech in #897
Wildcard Stable Diffusion
A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__
to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file.
For a code example, see Wildcard Stable Diffusion
- Wildcard stable diffusion pipeline by @shyamsn97 in #900
Composable Stable Diffusion
Use logic operators to do compositional generation.
For a code example, see Composable Stable Diffusion
Imagic Stable Diffusion
Image editing with Stable Diffusion.
For a code example, see Imagic Stable Diffusion
Seed Resizing
Allows to generate a larger image while keeping the content of the original image.
For a code example, see Seed Resizing
📝 Changelog
- [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
- [Stable Diffusion] Add components function by @patrickvonplaten in #889
- [PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
- [DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
- DOC Dreambooth Add --sample_batch_size=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
- [Examples] add speech to image pipeline example by @MikailINTech in #897
- [dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
- Dreambooth class image generation: using unique names to avoid overwriting existing image by @leszekhanusz in #847
- fix test_components by @patil-suraj in #928
- Fix Compatibility with Nvidia NGC Containers by @tasercake in #919
- [Community Pipelines] Fix pad_tokens_and_weights in lpw_stable_diffusion by @SkyTNT in #925
- Bump the version to 0.7.0.dev0 by @anton-l in #912
- Introduce the copy mechanism by @anton-l in #924
- [Tests] Move stable diffusion into their own files by @patrickvonplaten in #936
- [Flax] dont warn for bf16 weights by @patil-suraj in #923
- Support LMSDiscreteScheduler in LDMPipeline by @mkshing in #891
- Wildcard stable diffusion pipeline by @shyamsn97 in #900
- [MPS] fix mps failing tests by @kashif in #934
- fix a small typo in pipeline_ddpm.py by @chenguolin in #948
- Reorganize pipeline tests by @anton-l in #963
- v1-5 docs updates by @apolinario in #921
- add community pipeline docs; add minimal text to some empty doc pages by @natolambert in #930
- Fix typo:
torch_type
->torch_dtype
by @pcuenca in #972 - add num_inference_steps arg to DDPM by @tmabraham in #935
- Add Composable diffusion to community pipeline examples by @MarkRich in #951
- [Flax] added broadcast_to_shape_from_left helper and Scheduler tests by @kashif in #864
- [Tests] Fix
mps
reproducibility issue when running with pytest-xdist by @anton-l in #976 - mps changes for PyTorch 1.13 by @pcuenca in #926
- [Onnx] support half-precision and fix bugs for onnx pipelines by @SkyTNT in #932
- [Dance Diffusion] Add dance diffusion by @patrickvonplaten in #803
- [Dance Diffusion] FP16 by @patrickvonplaten in #980
- [Dance Diffusion] Better naming by @patrickvonplaten in #981
- Fix typo in documentation title by @echarlaix in #975
- Add --pretrained_model_name_revision option to train_dreambooth.py by @shirayu in #933
- Do not use torch.float64 on the mps device by @pcuenca in #942
- CompVis -> diffusers script - allow converting from merged checkpoint to either EMA or non-EMA by @patrickvonplaten in #991
- fix a bug in the new version by @xiaohu2015 in #957
- Fix typos by @shirayu in #978
- Add missing import by @juliensimon in #979
- minimal stable diffusion GPU memory usage with accelerate hooks by @piEsposito in #850
- [inpaint pipeline] fix bug for multiple prompts inputs by @xiaohu2015 in #959
- Enable multi-process DataLoader for dreambooth by @skirsten in #950
- Small modification to enable usage by external scripts by @briancw in #956
- [Flax] Add Textual Inversion by @duongna21 in #880
- Continuation of #942: additional float64 failure by @pcuenca in #996
- fix dreambooth script. by @patil-suraj in #1017
- [Accelerate model loading] Fix meta device and super low memory usage by @patrickvonplaten in #1016
- [Flax] Add finetune Stable Diffusion by @duongna21 in #999
- [DreamBooth] Set train mode for text encoder by @duongna21 in #1012
- [Flax] Add DreamBooth by @duongna21 in #1001
- Deprecate
init_git_repo
, refactortrain_unconditional.py
by @anton-l in #1022 - update readme for flax examples by @patil-suraj in #1026
- Probably nicer to specify dependency on tensorboard in the training example by @lukovnikov in #998
- Add
--dataloader_num_workers
to the DDPM training example by @anton-l in #1027 - Document sequential CPU offload method on Stable Diffusion pipeline by @piEsposito in #1024
- Support grayscale images in
numpy_to_pil
by @anton-l in #1025 - [Flax SD finetune] Fix dtype by @duongna21 in #1038
- fix
F.interpolate()
for large batch sizes by @NouamaneTazi in #1006 - [Tests] Improve unet / vae tests by @patrickvonplaten in #1018
- [Tests] Speed up slow tests by @patrickvonplaten in #1040
- Fix some failing tests by @patrickvonplaten in #1041
- [Tests] Better prints by @patrickvonplaten in #1043
- [Tests] no random latents anymore by @patrickvonplaten in #1045
- Update training and fine-tuning docs by @pcuenca in #1020
- Fix speedup ratio in fp16.mdx by @mwbyeon in #837
- clean incomplete pages by @natolambert in #1008
- Add seed resizing to community pipelines by @MarkRich in #1011
- Tests: upgrade PyTorch cuda to 11.7 to fix examples tests. by @pcuenca in #1048
- Experimental: allow fp16 in
mps
by @pcuenca in #961 - Move safety detection to model call in Flax safety checker by @jonatanklosko in #1023
- Fix pipelines user_agent, ignore CI requests by @anton-l in #1058
- [GitBot] Automatically close issues after inactivitiy by @patrickvonplaten in #1079
- Allow
safety_checker
to beNone
when using CPU offload by @pcuenca in #1078 - k-diffusion-euler by @hlky in #1019
- [Better scheduler docs] Improve usage examples of schedulers by @patrickvonplaten in #890
- [Tests] Fix slow tests by @patrickvonplaten in #1087
- Remove nn sequential by @patrickvonplaten in #1086
- Remove some unused parameter in CrossAttnUpBlock2D by @LaurentMazare in #1034
- Add imagic to community pipelines by @MarkRich in #958
- Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR in #532
- [docs] add euler scheduler in docs, how to use differnet schedulers by @patil-suraj in #1089
- Integration tests precision improvement for inpainting by @Lewington-pitsos in #1052
- lpw_stable_diffusion: Add is_cancelled_callback by @irgolic in #1053
- Rename latent by @patrickvonplaten in #1102
- fix typo in examples dreambooth README.md by @jorahn in #1073
- fix model card url in text inversion readme. by @patil-suraj in #1103
- [CI] Framework and hardware-specific CI tests by @anton-l in #997
- Fix a small typo of a variable name by @omihub777 in #1063
- Fix tests for equivalence of DDIM and DDPM pipelines by @sgrigory in #1069
- Fix padding in dreambooth by @shirayu in #1030
- [Flax] time embedding by @kashif in #1081
- Training to predict x0 in training example by @lukovnikov in #1031
- [Loading] Ignore unneeded files by @patrickvonplaten in #1107
- Fix hub-dependent tests for PRs by @anton-l in #1119
- Allow saving
None
pipeline components by @anton-l in #1118 - feat: add repaint by @Revist in #974
- Continuation of #1035 by @pcuenca in #1120
- VQ-diffusion by @williamberman in #658