diffusers 0.12.0 on Python PyPI

🪄 Instruct-Pix2Pix

Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.

The model was released with the paper InstructPix2Pix: Learning to Follow Image Editing Instructions. More information about the model can be found in the paper.

pip install diffusers transformers safetensors accelerate

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(url)

prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images[0].save("snowy_mountains.png")

Add InstructPix2Pix pipeline by @patil-suraj #2040

🤖 DiT

Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.

The model was released with the paper Scalable Diffusion Models with Transformers.

import torch
from diffusers import DiTPipeline

model_id = "facebook/DiT-XL-2-256"
pipe = DiTPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

# pick words that exist in ImageNet
words = ["white shark", "umbrella"]
class_ids = pipe.get_label_ids(words)

output = pipe(class_labels=class_ids)
image = output.images[0]  # label 'white shark'

⚡ LoRA

LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.

LoRA was proposed in LoRA: Low-Rank Adaptation of Large Language Models. In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.

Diffusers now supports LoRA! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to UNet2DConditionModel and DreamBooth training script by @patrickvonplaten in #1884.

By using LoRA, the fine-tuned checkpoints will be just 3 MBs in size. After fine-tuning, you can use the LoRA checkpoints like so:

from diffusers import StableDiffusionPipeline
import torch

model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")

prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")

You can follow these resources to know more about how to use LoRA in diffusers:

text2image fine-tuning script (by @sayakpaul in #2031).
Official documentation discussing how LoRA is supported (by @sayakpaul in #2086).

📐 Customizable Cross Attention

LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as Prompt-to-Prompt, and it makes it easier to apply optimizers like xFormers. This new "attention processor" abstraction was created by @patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!

🌿 Flax => PyTorch

A long requested feature, prolific community member @camenduru took up the gauntlet in #1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks @camenduru!

🌀 Flax Img2Img

Another community member, @dhruvrnaik, ported the image-to-image pipeline to Flax in #1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!

🎲 DEIS Scheduler

DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps.
The scheduler was introduced in the paper Fast Sampling of Diffusion Models with Exponential Integrator. More information about the scheduler can be found in the paper.

from diffusers import StableDiffusionPipeline, DEISMultistepScheduler
import torch

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0

feat : add log-rho deis multistep scheduler by @qsh-zh #1432

Reproducibility

One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures
much better reproducibility across GPU hardware:

import torch
from diffusers import DDIMPipeline
import numpy as np

model_id = "google/ddpm-cifar10-32"

# load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)
ddim.to("cuda")

# create a generator for reproducibility
generator = torch.manual_seed(0)

# run pipeline for just two steps and return numpy tensor
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())

See: #1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

Important New Guides

Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
LoRA: https://huggingface.co/docs/diffusers/training/lora

Important Bug Fixes

Don't download safetensors if library is not installed: #2057
Make sure that save_pretrained(...) doesn't accidentally delete files: #2038
Fix CPU offload docs for maximum memory gain: #1968
Fix conversion for exotically sorted weight names: #1959
Fix intermediate checkpointing for textual inversion, thanks @lstein #2072

All commits

update composable diffusion for an updated diffuser library by @nanlliu in #1697
[Tests] Fix UnCLIP cpu offload tests by @anton-l in #1769
Bump to 0.12.0.dev0 by @anton-l in #1771
[Dreambooth] flax fixes by @pcuenca in #1765
update train_unconditional_ort.py by @prathikr in #1775
Only test for xformers when enabling them #1773 by @kig in #1776
expose polynomial:power and cosine_with_restarts:num_cycles params by @zetyquickly in #1737
[Flax] Stateless schedulers, fixes and refactors by @skirsten in #1661
Correct hf hub download by @patrickvonplaten in #1767
Dreambooth docs: minor fixes by @pcuenca in #1758
Fix num images per prompt unclip by @patil-suraj in #1787
Add Flax stable diffusion img2img pipeline by @dhruvrnaik in #1355
Refactor cross attention and allow mechanism to tweak cross attention function by @patrickvonplaten in #1639
Fix OOM when using PyTorch with JAX installed. by @pcuenca in #1795
reorder model wrap + bug fix by @prathikr in #1799
Remove hardcoded names from PT scripts by @patrickvonplaten in #1778
[textual_inversion] unwrap_model text encoder before accessing weights by @patil-suraj in #1816
fix small mistake in annotation: 32 -> 64 by @Line290 in #1780
Make safety_checker optional in more pipelines by @pcuenca in #1796
Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by @camenduru in #1844
Avoid duplicating PyTorch + safetensors downloads. by @pcuenca in #1836
Width was typod as weight by @Helw150 in #1800
fix: resize transform now preserves aspect ratio by @parlance-zz in #1804
Make xformers optional even if it is available by @kn in #1753
Allow selecting precision to make Dreambooth class images by @kabachuha in #1832
unCLIP image variation by @williamberman in #1781
[Community Pipeline] MagicMix by @daspartho in #1839
[Versatile Diffusion] Fix cross_attention_kwargs by @patrickvonplaten in #1849
[Dtype] Align dtype casting behavior with Transformers and Accelerate by @patrickvonplaten in #1725
[StableDiffusionInpaint] Correct test by @patrickvonplaten in #1859
[textual inversion] add gradient checkpointing and small fixes. by @patil-suraj in #1848
Flax: Fix img2img and align with other pipeline by @skirsten in #1824
Make repo structure consistent by @patrickvonplaten in #1862
[Unclip] Make sure text_embeddings & image_embeddings can directly be passed to enable interpolation tasks. by @patrickvonplaten in #1858
Fix ema decay by @pcuenca in #1868
[Docs] Improve docs by @patrickvonplaten in #1870
[examples] update loss computation by @patil-suraj in #1861
[train_text_to_image] allow using non-ema weights for training by @patil-suraj in #1834
[Attention] Finish refactor attention file by @patrickvonplaten in #1879
Fix typo in train_dreambooth_inpaint by @pcuenca in #1885
Update ONNX Pipelines to use np.float64 instead of np.float by @agizmo in #1789
[examples] misc fixes by @patil-suraj in #1886
Fixes to the help for report_to in training scripts by @pcuenca in #1888
updated doc for stable diffusion pipelines by @yiyixuxu in #1770
Add UnCLIPImageVariationPipeline to dummy imports by @anton-l in #1897
Add accelerate and xformers versions to diffusers-cli env by @anton-l in #1898
[addresses issue #1642] add add_noise to scheduling-sde-ve by @aengusng8 in #1827
Add condtional generation to AudioDiffusionPipeline by @teticio in #1826
Fixes in comments in SD2 D2I by @neverix in #1903
[Deterministic torch randn] Allow tensors to be generated on CPU by @patrickvonplaten in #1902
[Docs] Remove duplicated API doc string by @patrickvonplaten in #1901
fix: DDPMScheduler.set_timesteps() by @Joqsan in #1912
Fix --resume_from_checkpoint step in train_text_to_image.py by @merfnad in #1914
Support training SD V2 with Flax by @yasyf in #1783
Fix lr-scaling store_true & default=True cli argument for textual_inversion training. by @aredden in #1090
Various Fixes for Flax Dreambooth by @yasyf in #1782
Test ResnetBlock2D by @hchings in #1850
Init for korean docs by @seriousran in #1910
New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by @peterwilli in #1615
Improve reproduceability 2/3 by @patrickvonplaten in #1906
feat : add log-rho deis multistep scheduler by @qsh-zh in #1432
Feature/colossalai by @Fazziekey in #1793
[Docs] Add TRANSLATING.md file by @seriousran in #1920
[StableDiffusionimg2img] validating input type by @Shubhamai in #1913
[dreambooth] low precision guard by @williamberman in #1916
[Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by @patrickvonplaten in #1927
[Conversion] Make sure ema weights are extracted correctly by @patrickvonplaten in #1937
fix path to logo by @vvssttkk in #1939
Add automatic doc sorting by @patrickvonplaten in #1940
update to latest colossalai by @Fazziekey in #1951
fix typo in imagic_stable_diffusion.py by @andreemic in #1956
[Conversion SD] Make sure weirdly sorted keys work as well by @patrickvonplaten in #1959
allow loading ddpm models into ddim by @patrickvonplaten in #1932
[Community] Correct checkpoint merger by @patrickvonplaten in #1965
Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by @oxidase in #1938
[CPU offload] correct cpu offload by @patrickvonplaten in #1968
[Docs] Update README.md by @haofanwang in #1960
Research project multi subject dreambooth by @klopsahlong in #1948
Example tests by @patrickvonplaten in #1982
Fix slow tests by @patrickvonplaten in #1983
Fix unused upcast_attn flag in convert_original_stable_diffusion_to_diffusers script by @kn in #1942
Allow converting Flax to PyTorch by adding a "from_flax" keyword by @camenduru in #1900
Update docstring by @Warvito in #1971
[SD Img2Img] resize source images to multiple of 8 instead of 32 by @vvsotnikov in #1571
Update README.md to include our blog post by @sayakpaul in #1998
Fix a couple typos in Dreambooth readme by @pcuenca in #2004
Add tests for 2D UNet blocks by @hchings in #1945
[Conversion] Support convert diffusers to safetensors by @hua1995116 in #1996
[Community] Fix merger by @patrickvonplaten in #2006
[Conversion] Improve safetensors by @patrickvonplaten in #1989
[Black] Update black library by @patrickvonplaten in #2007
Fix typos in ColossalAI example by @haofanwang in #2001
Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by @williamberman in #1908
Change PNDMPipeline to use PNDMScheduler by @willdalh in #2003
[train_unconditional] fix LR scheduler init by @patil-suraj in #2010
[Docs] No more autocast by @patrickvonplaten in #2021
[Flax] Add Flax inpainting impl by @xvjiarui in #1966
Check k-diffusion version is at least 0.0.12 by @pcuenca in #2022
DiT Pipeline by @kashif in #1806
fix dit doc header by @patil-suraj in #2027
[LoRA] Add LoRA training script by @patrickvonplaten in #1884
[Dit] Fix dit tests by @patrickvonplaten in #2034
Fix typos and minor redundancies by @Joqsan in #2029
[Lora] Model card by @patrickvonplaten in #2032
[Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by @patrickvonplaten in #2038
Fix EMA for multi-gpu training in the unconditional example by @anton-l in #1930
Minor fix in the documentation of LoRA by @hysts in #2045
Add InstructPix2Pix pipeline by @patil-suraj in #2040
Create repo before cloning in examples by @Wauplin in #2047
Remove modelcards dependency by @Wauplin in #2050
Module-ise "original stable diffusion to diffusers" conversion script by @damian0815 in #2019
[StableDiffusionInstructPix2Pix] use cpu generator in slow tests by @patil-suraj in #2051
[From pretrained] Don't download .safetensors files if safetensors is… by @patrickvonplaten in #2057
Correct Pix2Pix example by @patrickvonplaten in #2056
add community pipeline: StableUnCLIPPipeline by @budui in #2037
[LoRA] Adds example on text2image fine-tuning with LoRA by @sayakpaul in #2031
Safetensors loading in "convert_diffusers_to_original_stable_diffusion" by @cafeai in #2054
[examples] add dataloader_num_workers argument by @patil-suraj in #2070
Dreambooth: reduce VRAM usage by @gleb-akhmerov in #2039
[Paint by example] Fix cpu offload for paint by example by @patrickvonplaten in #2062
[textual_inversion] Fix resuming state when using gradient checkpointing by @pcuenca in #2072
[lora] Log images when using tensorboard by @pcuenca in #2078
Fix resume epoch for all training scripts except textual_inversion by @pcuenca in #2079
[dreambooth] fix multi on gpu. by @patil-suraj in #2088
Run inference on a specific condition and fix call of manual_seed() by @shirayu in #2074
[Feat] checkpoint_merger works on local models as well as ones that use safetensors by @lstein in #2060
xFormers attention op arg by @takuma104 in #2049
[docs] [dreambooth] note random crop by @williamberman in #2085
Remove wandb from text_to_image requirements.txt by @pcuenca in #2092
[doc] update example for pix2pix by @patil-suraj in #2101
Add lora tag to the model tags by @apolinario in #2103
[docs] Adds a doc on LoRA support for diffusers by @sayakpaul in #2086
Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by @patrickvonplaten in #2071
Improve transformers versions handling by @patrickvonplaten in #2104
Reproducibility 3/3 by @patrickvonplaten in #1924

🙌 Significant community contributions 🙌

The following contributors have made significant changes to the library over the last release:

@nanlliu
- update composable diffusion for an updated diffuser library (#1697)
@skirsten
- [Flax] Stateless schedulers, fixes and refactors (#1661)
- Flax: Fix img2img and align with other pipeline (#1824)
@hchings
- Test ResnetBlock2D (#1850)
- Add tests for 2D UNet blocks (#1945)
@seriousran
- Init for korean docs (#1910)
- [Docs] Add TRANSLATING.md file (#1920)
@qsh-zh
- feat : add log-rho deis multistep scheduler (#1432)
@Fazziekey
- Feature/colossalai (#1793)
- update to latest colossalai (#1951)
@klopsahlong
- Research project multi subject dreambooth (#1948)
@xvjiarui
- [Flax] Add Flax inpainting impl (#1966)
@damian0815
- Module-ise "original stable diffusion to diffusers" conversion script (#2019)
@camenduru
- Allow converting Flax to PyTorch by adding a "from_flax" keyword (#1900)

diffusers 0.12.0 Instruct-Pix2Pix, DiT, LoRA on Python PyPI