huggingface/diffusers v0.3.0 on GitHub

📚 Shiny new docs!

Thanks to the community efforts for [Docs] and [Type Hints] we've started populating the Diffusers documentation pages with lots of helpful guides, links and API references.

📝 New API & breaking changes

New API

Pipeline, Model, and Scheduler outputs can now be both dataclasses, Dicts, and Tuples:

image = pipe("The red cat is sitting on a chair")["sample"][0]

is now replaced by:

image = pipe("The red cat is sitting on a chair").images[0]
# or
image = pipe("The red cat is sitting on a chair")["image"][0]
# or
image = pipe("The red cat is sitting on a chair")[0]

Similarly:

sample = unet(...).sample

and

prev_sample = scheduler(...).prev_sample

is now possible!

🚨🚨🚨 Breaking change 🚨🚨🚨

This PR introduces breaking changes for the following public-facing methods:

VQModel.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latents = model.encode(...) to latents = model.encode(...)[0] or latents = model.encode(...).latens
VQModel.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
VQModel.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample
AutoencoderKL.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latent_dist = model.encode(...) to latent_dist = model.encode(...)[0] or latent_dist = model.encode(...).latent_dist
AutoencoderKL.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
AutoencoderKL.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample

🎨 New Stable Diffusion pipelines

A couple of new pipelines have been added to Diffusers! We invite you to experiment with them, and to take them as inspiration to create your cool new tasks. These are the new pipelines:

Image-to-image generation. In addition to using a text prompt, this pipeline lets you include an example image to be used as the initial state of the process. 🤗 Diffuse the Rest is a cool demo about it!
Inpainting (experimental). You can provide an image and a mask and ask Stable Diffusion to replace the mask.

For more details about how they work, please visit our new API documentation.

This is a summary of all the Stable Diffusion tasks that can be easily used with 🤗 Diffusers:

Pipeline	Tasks	Demo
pipeline_stable_diffusion.py	Text-to-Image Generation	🤗 Stable Diffusion
pipeline_stable_diffusion_img2img.py	Image-to-Image Text-Guided Generation	🤗 Diffuse the Rest
pipeline_stable_diffusion_inpaint.py	Experimental – Text-Guided Image Inpainting	Coming soon

🍬 Less memory usage for smaller GPUs

Now the diffusion models can take up significantly less VRAM (3.2 GB for Stable Diffusion) at the expense of 10% of speed thanks to the optimizations discussed in basujindal/stable-diffusion#117.

To make use of the attention optimization, just enable it with .enable_attention_slicing() after loading the pipeline:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()

This will allow many more users to play with Stable Diffusion in their own computers! We can't wait to see what new ideas and results will be created by the community!

🐈‍⬛ Textual Inversion

Textual Inversion lets you personalize a Stable Diffusion model on your own images with just 3-5 samples.

GitHub: https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion
Training: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb
Inference: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_conceptualizer_inference.ipynb

🍎 MPS backend for Apple Silicon

🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. You need to install PyTorch Preview (Nightly) on a Mac with M1 or M2 CPU, and then use the pipeline as usual:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True)
pipe = pipe.to("mps")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

We are seeing great speedups (31s vs 214s in a M1 Max), but there are still a couple of limitations. We encourage you to read the documentation for the details.

🏭 Experimental ONNX exporter and pipeline for Stable Diffusion

We introduce a new (and experimental) Stable Diffusion pipeline compatible with the ONNX Runtime. This allows you to run Stable Diffusion on any hardware that supports ONNX (including a significant speedup on CPUs).

You need to use StableDiffusionOnnxPipeline instead of StableDiffusionPipeline. You also need to download the weights from the onnx branch of the repository, and indicate the runtime provider you want to use (CPU, in the following example):

from diffusers import StableDiffusionOnnxPipeline

pipe = StableDiffusionOnnxPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="onnx",
    provider="CPUExecutionProvider",
    use_auth_token=True,
)

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

⚠️ Warning: the script above takes a long time to download the external ONNX weights, so it will be faster to convert the checkpoint yourself (see below).

To convert your own checkpoint, run the conversion script locally:

python scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

After that it can be loaded from the local path:

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="CPUExecutionProvider")

Improvements and bugfixes

Mark in painting experimental by @patrickvonplaten in #430
Add config docs by @patrickvonplaten in #429
[Docs] Models by @kashif in #416
[Docs] Using diffusers by @patrickvonplaten in #428
[Outputs] Improve syntax by @patrickvonplaten in #423
Initial ONNX doc (TODO: Installation) by @pcuenca in #426
[Tests] Correct image folder tests by @patrickvonplaten in #427
[MPS] Make sure it doesn't break torch < 1.12 by @patrickvonplaten in #425
[ONNX] Stable Diffusion exporter and pipeline by @anton-l in #399
[Tests] Make image-based SD tests reproducible with fixed datasets by @anton-l in #424
[Docs] Outputs.mdx by @patrickvonplaten in #422
[Docs] Fix scheduler docs by @patrickvonplaten in #421
[Docs] DiffusionPipeline by @patrickvonplaten in #418
Improve unconditional diffusers example by @satpalsr in #414
Improve latent diff example by @satpalsr in #413
Inference support for mps device by @pcuenca in #355
[Docs] Minor fixes in optimization section by @patrickvonplaten in #420
[Docs] Pipelines for inference by @satpalsr in #417
[Docs] Training docs by @patrickvonplaten in #415
Docs: fp16 page by @pcuenca in #404
Add typing to scheduling_sde_ve: init, set_timesteps, and set_sigmas function definitions by @danielpatrickhug in #412
Docs fix some typos by @natolambert in #408
[docs sprint] schedulers docs, will update by @natolambert in #376
Docs: fix undefined in toctree by @natolambert in #406
Attention slicing by @patrickvonplaten in #407
Rename variables from single letter to meaningful name fix by @rashmimarganiatgithub in #395
Docs: Stable Diffusion pipeline by @pcuenca in #386
Small changes to Philosophy by @pcuenca in #403
karras-ve docs by @kashif in #401
Score sde ve doc by @kashif in #400
[Docs] Finish Intro Section by @patrickvonplaten in #402
[Docs] Quicktour by @patrickvonplaten in #397
ddim docs by @kashif in #396
Docs: optimization / special hardware by @pcuenca in #390
added pndm docs by @kashif in #391
Update text_inversion.mdx by @johnowhitaker in #393
[Docs] Logging by @patrickvonplaten in #394
[Pipeline Docs] ddpm docs for sprint by @kashif in #382
[Pipeline Docs] Unconditional Latent Diffusion by @satpalsr in #388
Docs: Conceptual section by @pcuenca in #392
[Pipeline Docs] Latent Diffusion by @patrickvonplaten in #377
[textual-inversion] fix saving embeds by @patil-suraj in #387
[Docs] Let's go by @patrickvonplaten in #385
Add colab links to textual inversion by @apolinario in #375
Efficient Attention by @patrickvonplaten in #366
Use expand instead of ones to broadcast tensor by @pcuenca in #373
[Tests] Fix SD slow tests by @anton-l in #364
[Type Hint] VAE models by @daspartho in #365
[Type hint] scheduling lms discrete by @santiviquez in #360
[Type hint] scheduling karras ve by @santiviquez in #359
type hints: models/vae.py by @shepherd1530 in #346
[Type Hints] DDIM pipelines by @sidthekidder in #345
[ModelOutputs] Replace dict outputs with Dict/Dataclass and allow to return tuples by @patrickvonplaten in #334
package version on main should have .dev0 suffix by @mishig25 in #354
[textual_inversion] use tokenizer.add_tokens to add placeholder_token by @patil-suraj in #357
[Type hint] scheduling ddim by @santiviquez in #343
[Type Hints] VAE models by @daspartho in #344
[Type Hint] DDPM schedulers by @daspartho in #349
[Type hint] PNDM schedulers by @daspartho in #335
Fix typo in unet_blocks.py by @da03 in #353
[Commands] Add env command by @patrickvonplaten in #352
Add transformers and scipy to dependency table by @patrickvonplaten in #348
[Type Hint] Unet Models by @sidthekidder in #330
[Img2Img2] Re-add K LMS scheduler by @patrickvonplaten in #340
Use ONNX / Core ML compatible method to broadcast by @pcuenca in #310
[Type hint] PNDM pipeline by @daspartho in #327
[Type hint] Latent Diffusion Uncond pipeline by @santiviquez in #333
Add contributions to README and re-order a bit by @patrickvonplaten in #316
[CI] try to fix GPU OOMs between tests and excessive tqdm logging by @anton-l in #323
README: stable diffusion version v1-3 -> v1-4 by @pcuenca in #331
Textual inversion by @patil-suraj in #266
[Type hint] Score SDE VE pipeline by @santiviquez in #325
[CI] Cancel pending jobs for PRs on new commits by @anton-l in #324
[train_unconditional] fix gradient accumulation. by @patil-suraj in #308
Fix nondeterministic tests for GPU runs by @anton-l in #314
Improve README to show how to use SD without an access token by @patrickvonplaten in #315
Fix flake8 F401 imported but unused by @anton-l in #317
Allow downloading of revisions for models. by @okalldal in #303
Fix more links by @python273 in #312
Changed variable name from "h" to "hidden_states" by @JC-swEng in #285
Fix stable-diffusion-seeds.ipynb link by @python273 in #309
[Tests] Add fast pipeline tests by @patrickvonplaten in #302
Improve README by @patrickvonplaten in #301
[Refactor] Remove set_seed by @patrickvonplaten in #289
[Stable Diffusion] Hotfix by @patrickvonplaten in #299
Check dummy file by @patrickvonplaten in #297
Add missing auth tokens for two SD tests by @anton-l in #296
Fix GPU tests (token + single-process) by @anton-l in #294
[PNDM Scheduler] format timesteps attrs to np arrays by @NouamaneTazi in #273
Fix link by @python273 in #286
[Type hint] Karras VE pipeline by @patrickvonplaten in #288
Add datasets + transformers + scipy to test deps by @anton-l in #279
Easily understandable error if inference steps not set before using scheduler by @samedii in #263)
[Docs] Add some guides by @patrickvonplaten in #276
[README] Add readme for SD by @patrickvonplaten in #274
Refactor Pipelines / Community pipelines and add better explanations. by @patrickvonplaten in #257
Refactor progress bar by @hysts in #242
Support K-LMS in img2img by @anton-l in #270
[BugFix]: Fixed add_noise in LMSDiscreteScheduler by @nicolas-dufour in #253
[Tests] Make sure tests are on GPU by @patrickvonplaten in #269
Adds missing torch imports to inpainting and image_to_image example by @PulkitMishra in #265
Fix typo in README.md by @webel in #260
Fix inpainting script by @patil-suraj in #258
Initialize CI for code quality and testing by @anton-l in #256
add inpainting example script by @nagolinc in #241
Update README.md with examples by @natolambert in #252
Reproducible images by supplying latents to pipeline by @pcuenca in #247
Style the scripts directory by @anton-l in #250
Pin black==22.3 to keep a stable --preview flag by @anton-l in #249
[Clean up] Clean unused code by @patrickvonplaten in #245
added test workflow and fixed failing test by @kashif in #237
split tests_modeling_utils by @kashif in #223
[example/image2image] raise error if strength is not in desired range by @patil-suraj in #238
Add image2image example script. by @patil-suraj in #231
Remove dead code in resnet.py by @ydshieh in #218

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kashif
- [Docs] Models (#416)
- karras-ve docs (#401)
- Score sde ve doc (#400)
- ddim docs (#396)
- added pndm docs (#391)
- [Pipeline Docs] ddpm docs for sprint (#382)
- added test workflow and fixed failing test (#237)
- split tests_modeling_utils (#223)

huggingface/diffusers v0.3.0 v0.3.0: New API, Stable Diffusion pipelines, low-memory inference, MPS backend, ONNX on GitHub