Taking Diffusers Beyond Image Generation

We are very excited about this release! It brings new pipelines for video and audio to diffusers, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of diffusers was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use diffusers for other projects!

In addition to that, diffusers 0.15 includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!

🎬 Text-to-Video

Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.

Text-to-Video

Alibaba's DAMO Vision Intelligence Lab has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:

import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

prompt = "Spiderman is surfing"
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)

For more information you can have a look at "damo-vilab/text-to-video-ms-1.7b"

Text-to-Video Zero

Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.

Ftb9VnoakAE_B7T.mp4

For more information please have a look at PAIR/Text2Video-Zero

🔉 Audio Generation

Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models.
The 0.15.0 release includes two powerful audio diffusion models.

AudioLDM

Inspired by Stable Diffusion, AudioLDM
is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
sound effects, human speech and music.

from diffusers import AudioLDMPipeline
import torch

repo_id = "cvssp/audioldm"
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]

The resulting audio output can be saved as a .wav file:

import scipy

scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)

For more information see cvssp/audioldm

Spectrogram Diffusion

This model from the Magenta team is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.

from diffusers import SpectrogramDiffusionPipeline, MidiProcessor

pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion")
pipe = pipe.to("cuda")
processor = MidiProcessor()

# Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid
output = pipe(processor("beethoven_hammerklavier_2.mid"))

audio = output.audios[0]

📗 New Docs

Documentation is crucially important for diffusers, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by @stevhliu (thanks a lot! 🙌) and @yiyixuxu, but many others have chimed in and contributed.

Check it out: https://huggingface.co/docs/diffusers/index

Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) contribution guide.

🪄 Stable UnCLIP

Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:

from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-unclip-small", torch_dtype=torch.float16)
pipe.to("cuda")

# get image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
image = load_image(url)

# run image variation
image = pipe(image).images[0]

For more information you can have a look at "stabilityai/stable-diffusion-2-1-unclip"

Fsei9kLaUAM27yZ.mp4

🚀 More ControlNet

ControlNet was released in diffusers in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by @mikegarts!

Multi-ControlNet

Thanks to community member @takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", 
                                                   torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", 
                                                   torch_dtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.from_pretrained(
	"example/a-sd15-variant-model", torch_dtype=torch.float16,
	controlnet=[controlnet_pose, controlnet_canny]
).to("cuda")

pose_image = ...
canny_image = ...
prompt = ...

image = pipe(prompt=prompt, image=[pose_image, canny_image]).images[0]

And this is an example of how this affects generation:

Control Image1	Control Image2	Generated

	(none)
	(none)

ControlNet Training

We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects 🤯. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.

🐈‍⬛ Textual Inversion, Revisited

Several great contributors have been working on textual inversion to get the most of it. @isamu-isozaki made it possible to perform multitoken training, and @piEsposito & @GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with 🙌, we feel honored and proud of this community 🙏

Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in diffusers. Please check the updated documentation for details.

🏃 Faster loading of cached pipelines

We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:

Previous: 2.27 sec
Now: 1.1 sec

Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated – if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.

This may not sound as much, but many people use diffusers for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.

This can be further reduced by forcing diffusers to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.

🔩 Weight prompting using `compel`

Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because diffusers is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:

Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
At the same time, we adopted compel, by @damian0815, as a higher-level library to create the weighted embeddings.

You don't have to use compel to create the embeddings, but if you do, this is an example of how it looks in practice:

from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
from compel import Compel

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
prompt = "a red cat playing with a ball++"
prompt_embeds = compel_proc(prompt)

image = pipe(prompt_embeds=prompt_embeds, num_inference_steps=20).images[0]

As you can see, we assign more weight to the ball word using a compel-specific syntax (ball++). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.

You can read more details in the documentation.

🎲 Karras Sigmas for schedulers

Some diffusers schedulers now support Karras sigmas! Thanks @nipunjindal !

See Add Karras pattern to discrete euler in #2956 for more information.

All commits

Adding support for safetensors and LoRa. by @Narsil in #2448
[Post release] Push post release by @patrickvonplaten in #2546
Correct section docs by @patrickvonplaten in #2540
adds xformers support to train_unconditional.py by @vvvm23 in #2520
Bug Fix: Remove explicit message argument in deprecate by @alvanli in #2421
Update pipeline_stable_diffusion_inpaint_legacy.py resize to integer multiple of 8 instead of 32 for init image and mask by @Laveraaa in #2350
move test num_images_per_prompt to pipeline mixin by @williamberman in #2488
Training tutorial by @stevhliu in #2473
Fix regression introduced in #2448 by @Narsil in #2551
Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by @DN6 in #2456
[PipelineTesterMixin] Handle non-image outputs for attn slicing test by @sanchit-gandhi in #2504
[Community Pipeline] Unclip Image Interpolation by @Abhinay1997 in #2400
Fix: controlnet docs format by @vicoooo26 in #2559
ema step, don't empty cuda cache by @williamberman in #2563
Add custom vae (diffusers type) to onnx converter by @ForserX in #2325
add OnnxStableDiffusionUpscalePipeline pipeline by @ssube in #2158
Support convert LoRA safetensors into diffusers format by @haofanwang in #2403
[Unet1d] correct docs by @patrickvonplaten in #2565
[Training] Fix tensorboard typo by @patrickvonplaten in #2566
allow Attend-and-excite pipeline work with different image sizes by @yiyixuxu in #2476
Allow textual_inversion_flax script to use save_steps and revision flag by @haixinxu in #2075
add intermediate logging for dreambooth training script by @yiyixuxu in #2557
community controlnet inpainting pipelines by @williamberman in #2561
[docs] Move relevant code for text2image to docs by @stevhliu in #2537
[docs] Move DreamBooth training materials to docs by @stevhliu in #2547
[docs] Move text-to-image LoRA training from blog to docs by @stevhliu in #2527
Update quicktour by @stevhliu in #2463
Support revision in Flax text-to-image training by @pcuenca in #2567
fix the default value of doc by @xiaohu2015 in #2539
Added multitoken training for textual inversion. Issue 369 by @isamu-isozaki in #661
[Docs]Fix invalid link to Pokemons dataset by @zxypro1 in #2583
[Docs] Weight prompting using compel by @patrickvonplaten in #2574
community stablediffusion controlnet img2img pipeline by @mikegarts in #2584
Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by @clarencechen in #2528
[docs] Move Textual Inversion training examples to docs by @stevhliu in #2576
add deps table check updated to ci by @williamberman in #2590
Add notebook doc img2img by @yiyixuxu in #2472
[docs] Build notebooks from Markdown by @stevhliu in #2570
[Docs] Fix link to colab by @patrickvonplaten in #2604
[docs] Update unconditional image generation docs by @stevhliu in #2592
Add OpenVINO documentation by @echarlaix in #2569
Support LoRA for text encoder by @haofanwang in #2588
fix: un-existing tmp config file in linux, avoid unnecessary disk IO by @knoopx in #2591
Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPi… by @antoche in #2558
add flax pipelines to api doc + doc string examples by @yiyixuxu in #2600
Fix typos by @standardAI in #2608
Migrate blog content to docs by @stevhliu in #2477
Add cache_dir to docs by @patrickvonplaten in #2624
Make sure that DEIS, DPM and UniPC can correctly be switched in & out by @patrickvonplaten in #2595
Revert "[docs] Build notebooks from Markdown" by @patrickvonplaten in #2625
Up vesion at which we deprecate "revision='fp16'" since transformers is not released yet by @patrickvonplaten in #2623
[Tests] Split scheduler tests by @patrickvonplaten in #2630
Improve ddim scheduler and fix bug when prediction type is "sample" by @PeterL1n in #2094
update paint by example docs by @williamberman in #2598
[From pretrained] Speed-up loading from cache by @patrickvonplaten in #2515
add translated docs by @LolitaSian in #2587
[Dreambooth] Editable number of class images by @Mr-Philo in #2251
Update quicktour.mdx by @standardAI in #2637
Update basic_training.mdx by @standardAI in #2639
controlnet sd 2.1 checkpoint conversions by @williamberman in #2593
[docs] Update readme by @stevhliu in #2612
[Pipeline loading] Remove send_telemetry by @patrickvonplaten in #2640
[docs] Build Jax notebooks for real by @stevhliu in #2641
Update loading.mdx by @standardAI in #2642
Support non square image generation for StableDiffusionSAGPipeline by @AkiSakurai in #2629
Update schedulers.mdx by @standardAI in #2647
[attention] Fix attention by @patrickvonplaten in #2656
Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by @takuma104 in #2627
[Tests] Adds a test suite for EMAModel by @sayakpaul in #2530
fix the in-place modification in unet condition when using controlnet by @andrehuang in #2586
image generation main process checks by @williamberman in #2631
[Hub] Upgrade to 0.13.2 by @patrickvonplaten in #2670
AutoencoderKL: clamp indices of blend_h and blend_v to input size by @kig in #2660
Update README.md by @qwjaskzxl in #2653
[Lora] correct lora saving & loading by @patrickvonplaten in #2655
Add ddim noise comparative analysis pipeline by @aengusng8 in #2665
Add support for different model prediction types in DDIMInverseScheduler by @clarencechen in #2619
controlnet integration tests num_inference_steps=3 by @williamberman in #2672
Controlnet training by @Ttl in #2545
[Docs] Adds a documentation page for evaluating diffusion models by @sayakpaul in #2516
[Tests] fix: slow serialization test by @sayakpaul in #2678
Update Dockerfile CUDA by @patrickvonplaten in #2682
T5Attention support for cross-attention by @kashif in #2654
Update custom_pipeline_overview.mdx by @standardAI in #2684
Update kerascv.mdx by @standardAI in #2685
Update img2img.mdx by @standardAI in #2688
Update conditional_image_generation.mdx by @standardAI in #2687
Update controlling_generation.mdx by @standardAI in #2690
Update unconditional_image_generation.mdx by @standardAI in #2686
Add image_processor by @yiyixuxu in #2617
[docs] Add overviews to each section by @stevhliu in #2657
[docs] Create better navigation on index by @stevhliu in #2658
[docs] Reorganize table of contents by @stevhliu in #2671
Rename attention by @patrickvonplaten in #2691
Adding use_safetensors argument to give more control to users by @Narsil in #2123
[docs] Add safety checker to ethical guidelines by @stevhliu in #2699
train_unconditional save restore unet parameters by @williamberman in #2706
Improve deprecation error message when using cross_attention import by @patrickvonplaten in #2710
fix image link in inpaint doc by @yiyixuxu in #2693
[docs] Update ONNX doc to use optimum by @sayakpaul in #2702
Enabling gradient checkpointing for VAE by @Pie31415 in #2536
[Tests] Correct PT2 by @patrickvonplaten in #2724
Update mps.mdx by @standardAI in #2749
Update torch2.0.mdx by @standardAI in #2748
Update fp16.mdx by @standardAI in #2746
Update dreambooth.mdx by @standardAI in #2742
Update philosophy.mdx by @standardAI in #2752
Update text_inversion.mdx by @standardAI in #2751
add: controlnet entry to training section in the docs. by @sayakpaul in #2677
Update numbers for Habana Gaudi in documentation by @regisss in #2734
Improve Contribution Doc by @patrickvonplaten in #2043
Fix typos by @apivovarov in #2715
[1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by @nipunjindal in #2723
Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by @hyowon-ha in #2731
Fix mps tests on torch 2.0 by @pcuenca in #2766
Add option to set dtype in pipeline.to() method by @1lint in #2317
stable diffusion depth batching fix by @williamberman in #2757
[docs] update torch 2 benchmark by @pcuenca in #2764
[docs] Clarify purpose of reproducibility docs by @stevhliu in #2756
[MS Text To Video] Add first text to video by @patrickvonplaten in #2738
mps: remove warmup passes by @pcuenca in #2771
Support for Offset Noise in examples by @haofanwang in #2753
add: section on multiple controlnets. by @sayakpaul in #2762
[Examples] InstructPix2Pix instruct training script by @sayakpaul in #2478
deduplicate training section in the docs. by @sayakpaul in #2788
[UNet3DModel] Fix with attn processor by @patrickvonplaten in #2790
[doc wip] literalinclude by @mishig25 in #2718
Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by @ainoya in #2732
Music Spectrogram diffusion pipeline by @kashif in #1044
[2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by @nipunjindal in #2779
[Docs] small fixes to the text to video doc. by @sayakpaul in #2787
Update train_text_to_image_lora.py by @haofanwang in #2767
Skip mps in text-to-video tests by @pcuenca in #2792
Flax controlnet by @yiyixuxu in #2727
[docs] Add Colab notebooks and Spaces by @stevhliu in #2713
Add AudioLDM by @sanchit-gandhi in #2232
Update train_text_to_image_lora.py by @haofanwang in #2795
Add ModelEditing pipeline by @bahjat-kawar in #2721
Relax DiT test by @kashif in #2808
Update onnxruntime package candidates by @PeixuanZuo in #2666
[Stable UnCLIP] Finish Stable UnCLIP by @patrickvonplaten in #2814
[Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by @sayakpaul in #2815
StableDiffusionModelEditingPipeline documentation by @bahjat-kawar in #2810
Update examples README.md to include the latest examples by @sayakpaul in #2839
Ruff: apply same rules as in transformers by @pcuenca in #2827
[Tests] Fix slow tests by @patrickvonplaten in #2846
Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by @unishift in #2845
Helper function to disable custom attention processors by @pcuenca in #2791
improve stable unclip doc. by @sayakpaul in #2823
add: better warning messages when handling multiple conditionings. by @sayakpaul in #2804
[WIP]Flax training script for controlnet by @yiyixuxu in #2818
Make dynamo wrapped modules work with save_pretrained by @pcuenca in #2726
[Init] Make sure shape mismatches are caught early by @patrickvonplaten in #2847
updated onnx pndm test by @kashif in #2811
[Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by @Stax124 in #2768
fix KarrasVePipeline bug by @junhsss in #2828
StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by @AkiSakurai in #2832
Remove suggestion to use cuDNN benchmark in docs by @d1g1t in #2793
Remove duplicate sentence in docstrings by @qqaatw in #2834
Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2842
Fix link to LoRA training guide in DreamBooth training guide by @ushuz in #2836
[WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by @dg845 in #2809
Add last_epoch argument to optimization.get_scheduler by @felixblanke in #2850
[WIP] Check UNet shapes in StableDiffusionInpaintPipeline init by @dg845 in #2853
[2761]: Add documentation for extra_in_channels UNet1DModel by @nipunjindal in #2817
[Tests] Adds a test to check if image_embeds None case is handled properly in StableUnCLIPImg2ImgPipeline by @sayakpaul in #2861
Update evaluation.mdx by @standardAI in #2862
Update overview.mdx by @standardAI in #2864
Update alt_diffusion.mdx by @standardAI in #2865
Update paint_by_example.mdx by @standardAI in #2869
Update stable_diffusion_safe.mdx by @standardAI in #2870
[Docs] Correct phrasing by @patrickvonplaten in #2873
[Examples] Add streaming support to the ControlNet training example in JAX by @sayakpaul in #2859
feat: allow offset_noise in dreambooth training example by @yamanahlawat in #2826
[docs] Performance tutorial by @stevhliu in #2773
[Docs] add an example use for StableUnCLIPPipeline in the pipeline docs by @sayakpaul in #2897
add flax requirement by @yiyixuxu in #2894
Support fp16 in conversion from original ckpt by @burgalon in #2733
img2img.multiple.controlnets.pipeline by @mikegarts in #2833
add load textual inversion embeddings to stable diffusion by @piEsposito in #2009
[docs] add the Stable diffusion with Jax/Flax Guide into the docs by @yiyixuxu in #2487
Add support Karras sigmas for StableDiffusionKDiffusionPipeline by @takuma104 in #2874
Fix textual inversion loading by @GuiyeC in #2914
Fix slow tests text inv by @patrickvonplaten in #2915
Fix check_inputs in upscaler pipeline to allow embeds by @d1g1t in #2892
Modify example with intel optimization by @mengfei25 in #2896
[2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline by @nipunjindal in #2902
[Tests] Speed up test by @patrickvonplaten in #2919
Have fix current pipeline link by @guspan-tanadi in #2910
Update image_variation.mdx by @standardAI in #2911
Update controlnet.mdx by @standardAI in #2912
Update pipeline_stable_diffusion_controlnet.py by @patrickvonplaten in #2917
Check for all different packages of opencv by @wfng92 in #2901
fix: norm group test for UNet3D. by @sayakpaul in #2959
Update euler_ancestral.mdx by @standardAI in #2932
Update unipc.mdx by @standardAI in #2936
Update score_sde_ve.mdx by @standardAI in #2937
Update score_sde_vp.mdx by @standardAI in #2938
Update ddim.mdx by @standardAI in #2926
Update ddpm.mdx by @standardAI in #2929
Removing explicit markdown extension by @guspan-tanadi in #2944
Ensure validation image RGB not RGBA by @ernestchu in #2945
Use upload_folder in training scripts by @Wauplin in #2934
allow use custom local dataset for controlnet training scripts by @yiyixuxu in #2928
fix post-processing by @yiyixuxu in #2968
[docs] Simplify loading guide by @stevhliu in #2694
update flax controlnet training script by @yiyixuxu in #2951
[Pipeline download] Improve pipeline download for index and passed co… by @patrickvonplaten in #2980
The variable name has been updated. by @kadirnar in #2970
Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2962
[Examples] Add support for Min-SNR weighting strategy for better convergence by @sayakpaul in #2899
[scheduler] fix some scheduler dtype error by @furry-potato-maker in #2992
minor fix in controlnet flax example by @yiyixuxu in #2986
Explain how to install test dependencies by @pcuenca in #2983
docs: Link Navigation Path API Pipelines by @guspan-tanadi in #2976
add Min-SNR loss to Controlnet flax train script by @yiyixuxu in #3016
dynamic threshold sampling bug fixes and docs by @williamberman in #3003
Initial draft of Core ML docs by @pcuenca in #2987
[Pipeline] Add TextToVideoZeroPipeline by @19and99 in #2954
Small typo correction in comments by @rogerioagjr in #3012
mps: skip unstable test by @pcuenca in #3037
Update contribution.mdx by @mishig25 in #3054
fix report tool by @patrickvonplaten in #3047
Fix config prints and save, load of pipelines by @patrickvonplaten in #2849
[docs] Reusing components by @stevhliu in #3000
Fix imports for composable_stable_diffusion pipeline by @nthh in #3002
config fixes by @williamberman in #3060
accelerate min version for ProjectConfiguration import by @williamberman in #3042
AttentionProcessor.group_norm num_channels should be query_dim by @williamberman in #3046
Update documentation by @George-Ogden in #2996
Fix scheduler type mismatch by @pcuenca in #3041
Fix invocation of some slow Flax tests by @pcuenca in #3058
add only cross attention to simple attention blocks by @williamberman in #3011
Fix typo and format BasicTransformerBlock attributes by @off99555 in #2953
unet time embedding activation function by @williamberman in #3048
Attention processor cross attention norm group norm by @williamberman in #3021
Attn added kv processor torch 2.0 block by @williamberman in #3023
[Examples] Fix type-casting issue in the ControlNet training script by @sayakpaul in #2994
[LoRA] Enabling limited LoRA support for text encoder by @sayakpaul in #2918
fix slow tsets by @patrickvonplaten in #3066
Fix InstructPix2Pix training in multi-GPU mode by @sayakpaul in #2978
[Docs] update Self-Attention Guidance docs by @SusungHong in #2952
Flax memory efficient attention by @pcuenca in #2889
[WIP] implement rest of the test cases (LoRA tests) by @Pie31415 in #2824
fix pipeline setattr value == None by @williamberman in #3063
add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by @ssube in #2597
[2064]: Add Karras to DPMSolverMultistepScheduler by @nipunjindal in #3001
Finish docs textual inversion by @patrickvonplaten in #3068
[Docs] refactor text-to-video zero by @sayakpaul in #3049
Update Flax TPU tests by @pcuenca in #3069
Fix a bug of pano when not doing CFG by @ernestchu in #3030
Text2video zero refinements by @19and99 in #3070

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Abhinay1997
- [Community Pipeline] Unclip Image Interpolation (#2400)
@ssube
- add OnnxStableDiffusionUpscalePipeline pipeline (#2158)
- add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines (#2597)
@haofanwang
- Support convert LoRA safetensors into diffusers format (#2403)
- Support LoRA for text encoder (#2588)
- Support for Offset Noise in examples (#2753)
- Update train_text_to_image_lora.py (#2767)
- Update train_text_to_image_lora.py (#2795)
@isamu-isozaki
- Added multitoken training for textual inversion. Issue 369 (#661)
@mikegarts
- community stablediffusion controlnet img2img pipeline (#2584)
- img2img.multiple.controlnets.pipeline (#2833)
@LolitaSian
- add translated docs (#2587)
@Ttl
- Controlnet training (#2545)
@nipunjindal
- [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
- [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779)
- [2761]: Add documentation for extra_in_channels UNet1DModel (#2817)
- [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline (#2902)
- [2905]: Add Karras pattern to discrete euler (#2956)
- [2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
@bahjat-kawar
- Add ModelEditing pipeline (#2721)
- StableDiffusionModelEditingPipeline documentation (#2810)
@piEsposito
- add load textual inversion embeddings to stable diffusion (#2009)
@19and99
- [Pipeline] Add TextToVideoZeroPipeline (#2954)
- Text2video zero refinements (#3070)
@MuhHanif
- Flax memory efficient attention (#2889)

huggingface/diffusers v0.15.0 v0.15.0 Beyond Image Generation on GitHub