🐳 Depth-Guided Stable Diffusion and 2.1 checkpoints
The new depth-guided stable diffusion model is fully supported in this release. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.
Installing the transformers
library from source is required for the MiDaS model:
pip install --upgrade git+https://github.com/huggingface/transformers/
import torch
import requests
from PIL import Image
from diffusers import StableDiffusionDepth2ImgPipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
).to("cuda")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = Image.open(requests.get(url, stream=True).raw)
prompt = "two tigers"
n_propmt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=n_propmt, strength=0.7).images[0]
The updated Stable Diffusion 2.1 checkpoints are also released and fully supported:
- https://huggingface.co/stabilityai/stable-diffusion-2-1
- https://huggingface.co/stabilityai/stable-diffusion-2-1-base
🦺 Safe Tensors
We now support SafeTensors: a new simple format for storing tensors safely (as opposed to pickle) that is still fast (zero-copy).
- [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
- [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16 |
---|---|---|---|---|---|---|---|
pickle (PyTorch) | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ |
H5 (Tensorflow) | ✓ | ✗ | ✓ | ✓ | ~ | ~ | ✗ |
SavedModel (Tensorflow) | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ |
MsgPack (flax) | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ |
SafeTensors | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
**More details about the comparison here: https://github.com/huggingface/safetensors#yet-another-format-
pip install safetensors
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.save_pretrained("./safe-stable-diffusion-2-1", safe_serialization=True)
# you can also push this checkpoint to the HF Hub and load from there
safe_pipe = StableDiffusionPipeline.from_pretrained("./safe-stable-diffusion-2-1")
New Pipelines
🖌️ Paint-by-example
An implementation of Paint by Example: Exemplar-based Image Editing with Diffusion Models by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen
- Add paint by example by @patrickvonplaten in #1533
import PIL
import requests
import torch
from io import BytesIO
from diffusers import DiffusionPipeline
def download_image(url):
response = requests.get(url)
return PIL.Image.open(BytesIO(response.content)).convert("RGB")
img_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example_1.png"
mask_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example_1.png"
example_url = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example_1.jpg"
init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))
example_image = download_image(example_url).resize((512, 512))
pipe = DiffusionPipeline.from_pretrained("Fantasy-Studio/Paint-by-Example", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
image = pipe(image=init_image, mask_image=mask_image, example_image=example_image).images[0]
Audio Diffusion and Latent Audio Diffusion
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to and from mel spectrogram images.
from IPython.display import Audio
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to("cuda")
output = pipe()
display(output.images[0])
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
[Experimental] K-Diffusion pipeline for Stable Diffusion
This pipeline is added to support the latest schedulers from @crowsonkb's k-diffusion
The purpose of this pipeline is to compare scheduler implementations and updates, so new features from other pipelines are unlikely to be supported!
- [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
pip install k-diffusion
from diffusers import StableDiffusionKDiffusionPipeline
import torch
pipe = StableDiffusionKDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base")
pipe = pipe.to("cuda")
pipe.set_scheduler("sample_heun")
image = pipe("astronaut riding horse", num_inference_steps=25).images[0]
New Schedulers
Heun scheduler inspired by Karras et. al
Algorithm 1 of Karras et. al. Scheduler ported from @crowsonkb’s k-diffusion
- Add 2nd order heun scheduler by @patrickvonplaten in #1336
from diffusers import HeunDiscreteScheduler
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = HeunDiscreteScheduler.from_config(pipe.scheduler.config)
Single step DPM-Solver
Original paper can be found here and the improved version. The original implementation can be found here.
- Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
from diffusers import DPMSolverSinglestepScheduler
pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
pipe.scheduler = DPMSolverSinglestepScheduler.from_config(pipe.scheduler.config)
📝 Changelog
- [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
- Hotfix for AttributeErrors in OnnxStableDiffusionInpaintPipelineLegacy by @anton-l in #1448
- Speed up test and remove kwargs from call by @patrickvonplaten in #1446
- v-prediction training support by @patil-suraj in #1455
- Fix Flax
from_pt
by @pcuenca in #1436 - Ensure Flax pipeline always returns numpy array by @pcuenca in #1435
- Add 2nd order heun scheduler by @patrickvonplaten in #1336
- fix slow tests by @patrickvonplaten in #1467
- Flax support for Stable Diffusion 2 by @pcuenca in #1423
- Updates Image to Image Inpainting community pipeline README by @vvvm23 in #1370
- StableDiffusion: Decode latents separately to run larger batches by @kig in #1150
- Fix bug in half precision for DPMSolverMultistepScheduler by @rtaori in #1349
- [Train unconditional] Unwrap model before EMA by @anton-l in #1469
- Add
ort_nightly_directml
to theonnxruntime
candidates by @anton-l in #1458 - Allow saving trained betas by @patrickvonplaten in #1468
- Fix dtype model loading by @patrickvonplaten in #1449
- [Dreambooth] Make compatible with alt diffusion by @patrickvonplaten in #1470
- Add better docs xformers by @patrickvonplaten in #1487
- Remove reminder comment by @pcuenca in #1489
- Bump to 0.10.0.dev0 + deprecations by @anton-l in #1490
- Add doc for Stable Diffusion on Habana Gaudi by @regisss in #1496
- Replace deprecated hub utils in
train_unconditional_ort
by @anton-l in #1504 - [Deprecate] Correct stacklevel by @patrickvonplaten in #1483
- simplyfy AttentionBlock by @patil-suraj in #1492
- Standardize on using
image
argument in all pipelines by @fboulnois in #1361 - support v prediction in other schedulers by @patil-suraj in #1505
- Fix Flax flip_sin_to_cos by @akashgokul in #1369
- Add an explicit
--image_size
to the conversion script by @anton-l in #1509 - fix heun scheduler by @patil-suraj in #1512
- [docs] [dreambooth training] accelerate.utils.write_basic_config by @williamberman in #1513
- [docs] [dreambooth training] num_class_images clarification by @williamberman in #1508
- [From pretrained] Allow returning local path by @patrickvonplaten in #1450
- Update conversion script to correctly handle SD 2 by @patrickvonplaten in #1511
- [refactor] Making the xformers mem-efficient attention activation recursive by @blefaudeux in #1493
- Do not use torch.long in mps by @pcuenca in #1488
- Fix Imagic example by @dhruvrnaik in #1520
- Fix training docs to install datasets by @pedrogengo in #1476
- Finalize 2nd order schedulers by @patrickvonplaten in #1503
- Fixed mask+masked_image in sd inpaint pipeline by @antoche in #1516
- Create train_dreambooth_inpaint.py by @thedarkzeno in #1091
- Update FlaxLMSDiscreteScheduler by @dzlab in #1474
- [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
- Add xformers attention to VAE by @kig in #1507
- [CI] Add slow MPS tests by @anton-l in #1104
- [Stable Diffusion Inpaint] Allow tensor as input image & mask by @patrickvonplaten in #1527
- Compute embedding distances with torch.cdist by @blefaudeux in #1459
- [Upscaling] Fix batch size by @patrickvonplaten in #1525
- Update bug-report.yml by @patrickvonplaten in #1548
- [Community Pipeline] Checkpoint Merger based on Automatic1111 by @Abhinay1997 in #1472
- [textual_inversion] Add an option for only saving the embeddings by @allo- in #781
- [examples] use from_pretrained to load scheduler by @patil-suraj in #1549
- fix mask discrepancies in train_dreambooth_inpaint by @thedarkzeno in #1529
- [refactor] make set_attention_slice recursive by @patil-suraj in #1532
- Research folder by @patrickvonplaten in #1553
- add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426
- [Community download] Fix cache dir by @patrickvonplaten in #1555
- [Docs] Correct docs by @patrickvonplaten in #1554
- Fix typo by @pcuenca in #1558
- [docs] [dreambooth training] default accelerate config by @williamberman in #1564
- Mega community pipeline by @patrickvonplaten in #1561
- [examples] add check_min_version by @patil-suraj in #1550
- [dreambooth] make collate_fn global by @patil-suraj in #1547
- Standardize fast pipeline tests with PipelineTestMixin by @anton-l in #1526
- Add paint by example by @patrickvonplaten in #1533
- [Community Pipeline] fix lpw_stable_diffusion by @SkyTNT in #1570
- [Paint by Example] Better default for image width by @patrickvonplaten in #1587
- Add from_pretrained telemetry by @anton-l in #1461
- Correct order height & width in pipeline_paint_by_example.py by @Fantasy-Studio in #1589
- Fix common tests for FP16 by @anton-l in #1588
- [UNet2DConditionModel] add an option to upcast attention to fp32 by @patil-suraj in #1590
- Flax: avoid recompilation when params change by @pcuenca in #1096
- Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
- fix upcast in slice attention by @patil-suraj in #1591
- Update scheduling_repaint.py by @Randolph-zeng in #1582
- Update RL docs for better sharing / adding models by @natolambert in #1563
- Make cross-attention check more robust by @pcuenca in #1560
- [ONNX] Fix flaky tests by @anton-l in #1593
- Trivial fix for undefined symbol in train_dreambooth.py by @bcsherma in #1598
- [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
- [Versatile Diffusion] add upcast_attention by @patil-suraj in #1605
- Fix PyCharm/VSCode static type checking for dummy objects by @anton-l in #1596