Taking Diffusers Beyond Image Generation
We are very excited about this release! It brings new pipelines for video and audio to diffusers
, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of diffusers
was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use diffusers
for other projects!
In addition to that, diffusers 0.15
includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!
🎬 Text-to-Video
Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.
Text-to-Video
Alibaba's DAMO Vision Intelligence Lab has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:
import torch
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
prompt = "Spiderman is surfing"
video_frames = pipe(prompt, num_inference_steps=25).frames
video_path = export_to_video(video_frames)
For more information you can have a look at "damo-vilab/text-to-video-ms-1.7b"
Text-to-Video Zero
Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.
Ftb9VnoakAE_B7T.mp4
For more information please have a look at PAIR/Text2Video-Zero
🔉 Audio Generation
Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models.
The 0.15.0 release includes two powerful audio diffusion models.
AudioLDM
Inspired by Stable Diffusion, AudioLDM
is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP
latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional
sound effects, human speech and music.
from diffusers import AudioLDMPipeline
import torch
repo_id = "cvssp/audioldm"
pipe = AudioLDMPipeline.from_pretrained(repo_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]
The resulting audio output can be saved as a .wav file:
import scipy
scipy.io.wavfile.write("techno.wav", rate=16000, data=audio)
For more information see cvssp/audioldm
Spectrogram Diffusion
This model from the Magenta team is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.
from diffusers import SpectrogramDiffusionPipeline, MidiProcessor
pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion")
pipe = pipe.to("cuda")
processor = MidiProcessor()
# Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid
output = pipe(processor("beethoven_hammerklavier_2.mid"))
audio = output.audios[0]
📗 New Docs
Documentation is crucially important for diffusers
, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by @stevhliu (thanks a lot! 🙌) and @yiyixuxu, but many others have chimed in and contributed.
Check it out: https://huggingface.co/docs/diffusers/index
Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) contribution guide.
🪄 Stable UnCLIP
Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:
from diffusers import DiffusionPipeline
from diffusers.utils import load_image
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-unclip-small", torch_dtype=torch.float16)
pipe.to("cuda")
# get image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stable_unclip/tarsila_do_amaral.png"
image = load_image(url)
# run image variation
image = pipe(image).images[0]
For more information you can have a look at "stabilityai/stable-diffusion-2-1-unclip"
Fsei9kLaUAM27yZ.mp4
🚀 More ControlNet
ControlNet was released in diffusers
in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by @mikegarts!
Multi-ControlNet
Thanks to community member @takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
controlnet_canny = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny",
torch_dtype=torch.float16).to("cuda")
controlnet_pose = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose",
torch_dtype=torch.float16).to("cuda")
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"example/a-sd15-variant-model", torch_dtype=torch.float16,
controlnet=[controlnet_pose, controlnet_canny]
).to("cuda")
pose_image = ...
canny_image = ...
prompt = ...
image = pipe(prompt=prompt, image=[pose_image, canny_image]).images[0]
And this is an example of how this affects generation:
Control Image1 | Control Image2 | Generated |
---|---|---|
(none) | ||
(none) |
ControlNet Training
We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects 🤯. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.
🐈⬛ Textual Inversion, Revisited
Several great contributors have been working on textual inversion to get the most of it. @isamu-isozaki made it possible to perform multitoken training, and @piEsposito & @GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with 🙌, we feel honored and proud of this community 🙏
Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in diffusers
. Please check the updated documentation for details.
🏃 Faster loading of cached pipelines
We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:
Previous: 2.27 sec
Now: 1.1 sec
Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated – if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.
This may not sound as much, but many people use diffusers
for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.
This can be further reduced by forcing diffusers
to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.
🔩 Weight prompting using compel
Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because diffusers
is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:
- Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
- At the same time, we adopted
compel
, by @damian0815, as a higher-level library to create the weighted embeddings.
You don't have to use compel
to create the embeddings, but if you do, this is an example of how it looks in practice:
from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler
from compel import Compel
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
compel_proc = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)
prompt = "a red cat playing with a ball++"
prompt_embeds = compel_proc(prompt)
image = pipe(prompt_embeds=prompt_embeds, num_inference_steps=20).images[0]
As you can see, we assign more weight to the ball
word using a compel-specific syntax (ball++
). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.
You can read more details in the documentation.
🎲 Karras Sigmas for schedulers
Some diffusers schedulers now support Karras sigmas! Thanks @nipunjindal !
See Add Karras pattern to discrete euler in #2956 for more information.
All commits
- Adding support for
safetensors
and LoRa. by @Narsil in #2448 - [Post release] Push post release by @patrickvonplaten in #2546
- Correct section docs by @patrickvonplaten in #2540
- adds
xformers
support totrain_unconditional.py
by @vvvm23 in #2520 - Bug Fix: Remove explicit message argument in deprecate by @alvanli in #2421
- Update pipeline_stable_diffusion_inpaint_legacy.py resize to integer multiple of 8 instead of 32 for init image and mask by @Laveraaa in #2350
- move test num_images_per_prompt to pipeline mixin by @williamberman in #2488
- Training tutorial by @stevhliu in #2473
- Fix regression introduced in #2448 by @Narsil in #2551
- Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by @DN6 in #2456
- [PipelineTesterMixin] Handle non-image outputs for attn slicing test by @sanchit-gandhi in #2504
- [Community Pipeline] Unclip Image Interpolation by @Abhinay1997 in #2400
- Fix: controlnet docs format by @vicoooo26 in #2559
- ema step, don't empty cuda cache by @williamberman in #2563
- Add custom vae (diffusers type) to onnx converter by @ForserX in #2325
- add OnnxStableDiffusionUpscalePipeline pipeline by @ssube in #2158
- Support convert LoRA safetensors into diffusers format by @haofanwang in #2403
- [Unet1d] correct docs by @patrickvonplaten in #2565
- [Training] Fix tensorboard typo by @patrickvonplaten in #2566
- allow Attend-and-excite pipeline work with different image sizes by @yiyixuxu in #2476
- Allow textual_inversion_flax script to use save_steps and revision flag by @haixinxu in #2075
- add intermediate logging for dreambooth training script by @yiyixuxu in #2557
- community controlnet inpainting pipelines by @williamberman in #2561
- [docs] Move relevant code for text2image to docs by @stevhliu in #2537
- [docs] Move DreamBooth training materials to docs by @stevhliu in #2547
- [docs] Move text-to-image LoRA training from blog to docs by @stevhliu in #2527
- Update quicktour by @stevhliu in #2463
- Support revision in Flax text-to-image training by @pcuenca in #2567
- fix the default value of doc by @xiaohu2015 in #2539
- Added multitoken training for textual inversion. Issue 369 by @isamu-isozaki in #661
- [Docs]Fix invalid link to Pokemons dataset by @zxypro1 in #2583
- [Docs] Weight prompting using compel by @patrickvonplaten in #2574
- community stablediffusion controlnet img2img pipeline by @mikegarts in #2584
- Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by @clarencechen in #2528
- [docs] Move Textual Inversion training examples to docs by @stevhliu in #2576
- add deps table check updated to ci by @williamberman in #2590
- Add notebook doc img2img by @yiyixuxu in #2472
- [docs] Build notebooks from Markdown by @stevhliu in #2570
- [Docs] Fix link to colab by @patrickvonplaten in #2604
- [docs] Update unconditional image generation docs by @stevhliu in #2592
- Add OpenVINO documentation by @echarlaix in #2569
- Support LoRA for text encoder by @haofanwang in #2588
- fix: un-existing tmp config file in linux, avoid unnecessary disk IO by @knoopx in #2591
- Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPi… by @antoche in #2558
- add flax pipelines to api doc + doc string examples by @yiyixuxu in #2600
- Fix typos by @standardAI in #2608
- Migrate blog content to docs by @stevhliu in #2477
- Add cache_dir to docs by @patrickvonplaten in #2624
- Make sure that DEIS, DPM and UniPC can correctly be switched in & out by @patrickvonplaten in #2595
- Revert "[docs] Build notebooks from Markdown" by @patrickvonplaten in #2625
- Up vesion at which we deprecate "revision='fp16'" since
transformers
is not released yet by @patrickvonplaten in #2623 - [Tests] Split scheduler tests by @patrickvonplaten in #2630
- Improve ddim scheduler and fix bug when prediction type is "sample" by @PeterL1n in #2094
- update paint by example docs by @williamberman in #2598
- [From pretrained] Speed-up loading from cache by @patrickvonplaten in #2515
- add translated docs by @LolitaSian in #2587
- [Dreambooth] Editable number of class images by @Mr-Philo in #2251
- Update quicktour.mdx by @standardAI in #2637
- Update basic_training.mdx by @standardAI in #2639
- controlnet sd 2.1 checkpoint conversions by @williamberman in #2593
- [docs] Update readme by @stevhliu in #2612
- [Pipeline loading] Remove send_telemetry by @patrickvonplaten in #2640
- [docs] Build Jax notebooks for real by @stevhliu in #2641
- Update loading.mdx by @standardAI in #2642
- Support non square image generation for StableDiffusionSAGPipeline by @AkiSakurai in #2629
- Update schedulers.mdx by @standardAI in #2647
- [attention] Fix attention by @patrickvonplaten in #2656
- Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by @takuma104 in #2627
- [Tests] Adds a test suite for
EMAModel
by @sayakpaul in #2530 - fix the in-place modification in unet condition when using controlnet by @andrehuang in #2586
- image generation main process checks by @williamberman in #2631
- [Hub] Upgrade to 0.13.2 by @patrickvonplaten in #2670
- AutoencoderKL: clamp indices of blend_h and blend_v to input size by @kig in #2660
- Update README.md by @qwjaskzxl in #2653
- [Lora] correct lora saving & loading by @patrickvonplaten in #2655
- Add ddim noise comparative analysis pipeline by @aengusng8 in #2665
- Add support for different model prediction types in DDIMInverseScheduler by @clarencechen in #2619
- controlnet integration tests num_inference_steps=3 by @williamberman in #2672
- Controlnet training by @Ttl in #2545
- [Docs] Adds a documentation page for evaluating diffusion models by @sayakpaul in #2516
- [Tests] fix: slow serialization test by @sayakpaul in #2678
- Update Dockerfile CUDA by @patrickvonplaten in #2682
- T5Attention support for cross-attention by @kashif in #2654
- Update custom_pipeline_overview.mdx by @standardAI in #2684
- Update kerascv.mdx by @standardAI in #2685
- Update img2img.mdx by @standardAI in #2688
- Update conditional_image_generation.mdx by @standardAI in #2687
- Update controlling_generation.mdx by @standardAI in #2690
- Update unconditional_image_generation.mdx by @standardAI in #2686
- Add image_processor by @yiyixuxu in #2617
- [docs] Add overviews to each section by @stevhliu in #2657
- [docs] Create better navigation on index by @stevhliu in #2658
- [docs] Reorganize table of contents by @stevhliu in #2671
- Rename attention by @patrickvonplaten in #2691
- Adding
use_safetensors
argument to give more control to users by @Narsil in #2123 - [docs] Add safety checker to ethical guidelines by @stevhliu in #2699
- train_unconditional save restore unet parameters by @williamberman in #2706
- Improve deprecation error message when using cross_attention import by @patrickvonplaten in #2710
- fix image link in inpaint doc by @yiyixuxu in #2693
- [docs] Update ONNX doc to use
optimum
by @sayakpaul in #2702 - Enabling gradient checkpointing for VAE by @Pie31415 in #2536
- [Tests] Correct PT2 by @patrickvonplaten in #2724
- Update mps.mdx by @standardAI in #2749
- Update torch2.0.mdx by @standardAI in #2748
- Update fp16.mdx by @standardAI in #2746
- Update dreambooth.mdx by @standardAI in #2742
- Update philosophy.mdx by @standardAI in #2752
- Update text_inversion.mdx by @standardAI in #2751
- add: controlnet entry to training section in the docs. by @sayakpaul in #2677
- Update numbers for Habana Gaudi in documentation by @regisss in #2734
- Improve Contribution Doc by @patrickvonplaten in #2043
- Fix typos by @apivovarov in #2715
- [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by @nipunjindal in #2723
- Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by @hyowon-ha in #2731
- Fix mps tests on torch 2.0 by @pcuenca in #2766
- Add option to set dtype in pipeline.to() method by @1lint in #2317
- stable diffusion depth batching fix by @williamberman in #2757
- [docs] update torch 2 benchmark by @pcuenca in #2764
- [docs] Clarify purpose of reproducibility docs by @stevhliu in #2756
- [MS Text To Video] Add first text to video by @patrickvonplaten in #2738
mps
: remove warmup passes by @pcuenca in #2771- Support for Offset Noise in examples by @haofanwang in #2753
- add: section on multiple controlnets. by @sayakpaul in #2762
- [Examples] InstructPix2Pix instruct training script by @sayakpaul in #2478
- deduplicate training section in the docs. by @sayakpaul in #2788
- [UNet3DModel] Fix with attn processor by @patrickvonplaten in #2790
- [doc wip] literalinclude by @mishig25 in #2718
- Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by @ainoya in #2732
- Music Spectrogram diffusion pipeline by @kashif in #1044
- [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by @nipunjindal in #2779
- [Docs] small fixes to the text to video doc. by @sayakpaul in #2787
- Update train_text_to_image_lora.py by @haofanwang in #2767
- Skip
mps
in text-to-video tests by @pcuenca in #2792 - Flax controlnet by @yiyixuxu in #2727
- [docs] Add Colab notebooks and Spaces by @stevhliu in #2713
- Add AudioLDM by @sanchit-gandhi in #2232
- Update train_text_to_image_lora.py by @haofanwang in #2795
- Add ModelEditing pipeline by @bahjat-kawar in #2721
- Relax DiT test by @kashif in #2808
- Update onnxruntime package candidates by @PeixuanZuo in #2666
- [Stable UnCLIP] Finish Stable UnCLIP by @patrickvonplaten in #2814
- [Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by @sayakpaul in #2815
- StableDiffusionModelEditingPipeline documentation by @bahjat-kawar in #2810
- Update
examples
README.md to include the latest examples by @sayakpaul in #2839 - Ruff: apply same rules as in transformers by @pcuenca in #2827
- [Tests] Fix slow tests by @patrickvonplaten in #2846
- Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by @unishift in #2845
- Helper function to disable custom attention processors by @pcuenca in #2791
- improve stable unclip doc. by @sayakpaul in #2823
- add: better warning messages when handling multiple conditionings. by @sayakpaul in #2804
- [WIP]Flax training script for controlnet by @yiyixuxu in #2818
- Make dynamo wrapped modules work with save_pretrained by @pcuenca in #2726
- [Init] Make sure shape mismatches are caught early by @patrickvonplaten in #2847
- updated onnx pndm test by @kashif in #2811
- [Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by @Stax124 in #2768
- fix KarrasVePipeline bug by @junhsss in #2828
- StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by @AkiSakurai in #2832
- Remove suggestion to use cuDNN benchmark in docs by @d1g1t in #2793
- Remove duplicate sentence in docstrings by @qqaatw in #2834
- Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2842
- Fix link to LoRA training guide in DreamBooth training guide by @ushuz in #2836
- [WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by @dg845 in #2809
- Add
last_epoch
argument tooptimization.get_scheduler
by @felixblanke in #2850 - [WIP] Check UNet shapes in StableDiffusionInpaintPipeline init by @dg845 in #2853
- [2761]: Add documentation for extra_in_channels UNet1DModel by @nipunjindal in #2817
- [Tests] Adds a test to check if
image_embeds
None case is handled properly inStableUnCLIPImg2ImgPipeline
by @sayakpaul in #2861 - Update evaluation.mdx by @standardAI in #2862
- Update overview.mdx by @standardAI in #2864
- Update alt_diffusion.mdx by @standardAI in #2865
- Update paint_by_example.mdx by @standardAI in #2869
- Update stable_diffusion_safe.mdx by @standardAI in #2870
- [Docs] Correct phrasing by @patrickvonplaten in #2873
- [Examples] Add streaming support to the ControlNet training example in JAX by @sayakpaul in #2859
- feat: allow offset_noise in dreambooth training example by @yamanahlawat in #2826
- [docs] Performance tutorial by @stevhliu in #2773
- [Docs] add an example use for
StableUnCLIPPipeline
in the pipeline docs by @sayakpaul in #2897 - add flax requirement by @yiyixuxu in #2894
- Support fp16 in conversion from original ckpt by @burgalon in #2733
- img2img.multiple.controlnets.pipeline by @mikegarts in #2833
- add load textual inversion embeddings to stable diffusion by @piEsposito in #2009
- [docs] add the Stable diffusion with Jax/Flax Guide into the docs by @yiyixuxu in #2487
- Add support
Karras sigmas
for StableDiffusionKDiffusionPipeline by @takuma104 in #2874 - Fix textual inversion loading by @GuiyeC in #2914
- Fix slow tests text inv by @patrickvonplaten in #2915
- Fix check_inputs in upscaler pipeline to allow embeds by @d1g1t in #2892
- Modify example with intel optimization by @mengfei25 in #2896
- [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline by @nipunjindal in #2902
- [Tests] Speed up test by @patrickvonplaten in #2919
- Have fix current pipeline link by @guspan-tanadi in #2910
- Update image_variation.mdx by @standardAI in #2911
- Update controlnet.mdx by @standardAI in #2912
- Update pipeline_stable_diffusion_controlnet.py by @patrickvonplaten in #2917
- Check for all different packages of opencv by @wfng92 in #2901
- fix: norm group test for UNet3D. by @sayakpaul in #2959
- Update euler_ancestral.mdx by @standardAI in #2932
- Update unipc.mdx by @standardAI in #2936
- Update score_sde_ve.mdx by @standardAI in #2937
- Update score_sde_vp.mdx by @standardAI in #2938
- Update ddim.mdx by @standardAI in #2926
- Update ddpm.mdx by @standardAI in #2929
- Removing explicit markdown extension by @guspan-tanadi in #2944
- Ensure validation image RGB not RGBA by @ernestchu in #2945
- Use
upload_folder
in training scripts by @Wauplin in #2934 - allow use custom local dataset for controlnet training scripts by @yiyixuxu in #2928
- fix post-processing by @yiyixuxu in #2968
- [docs] Simplify loading guide by @stevhliu in #2694
- update flax controlnet training script by @yiyixuxu in #2951
- [Pipeline download] Improve pipeline download for index and passed co… by @patrickvonplaten in #2980
- The variable name has been updated. by @kadirnar in #2970
- Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2962
- [Examples] Add support for Min-SNR weighting strategy for better convergence by @sayakpaul in #2899
- [scheduler] fix some scheduler dtype error by @furry-potato-maker in #2992
- minor fix in controlnet flax example by @yiyixuxu in #2986
- Explain how to install test dependencies by @pcuenca in #2983
- docs: Link Navigation Path API Pipelines by @guspan-tanadi in #2976
- add Min-SNR loss to Controlnet flax train script by @yiyixuxu in #3016
- dynamic threshold sampling bug fixes and docs by @williamberman in #3003
- Initial draft of Core ML docs by @pcuenca in #2987
- [Pipeline] Add TextToVideoZeroPipeline by @19and99 in #2954
- Small typo correction in comments by @rogerioagjr in #3012
- mps: skip unstable test by @pcuenca in #3037
- Update contribution.mdx by @mishig25 in #3054
- fix report tool by @patrickvonplaten in #3047
- Fix config prints and save, load of pipelines by @patrickvonplaten in #2849
- [docs] Reusing components by @stevhliu in #3000
- Fix imports for composable_stable_diffusion pipeline by @nthh in #3002
- config fixes by @williamberman in #3060
- accelerate min version for ProjectConfiguration import by @williamberman in #3042
AttentionProcessor.group_norm
num_channels should bequery_dim
by @williamberman in #3046- Update documentation by @George-Ogden in #2996
- Fix scheduler type mismatch by @pcuenca in #3041
- Fix invocation of some slow Flax tests by @pcuenca in #3058
- add only cross attention to simple attention blocks by @williamberman in #3011
- Fix typo and format BasicTransformerBlock attributes by @off99555 in #2953
- unet time embedding activation function by @williamberman in #3048
- Attention processor cross attention norm group norm by @williamberman in #3021
- Attn added kv processor torch 2.0 block by @williamberman in #3023
- [Examples] Fix type-casting issue in the ControlNet training script by @sayakpaul in #2994
- [LoRA] Enabling limited LoRA support for text encoder by @sayakpaul in #2918
- fix slow tsets by @patrickvonplaten in #3066
- Fix InstructPix2Pix training in multi-GPU mode by @sayakpaul in #2978
- [Docs] update Self-Attention Guidance docs by @SusungHong in #2952
- Flax memory efficient attention by @pcuenca in #2889
- [WIP] implement rest of the test cases (LoRA tests) by @Pie31415 in #2824
- fix pipeline setattr value == None by @williamberman in #3063
- add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by @ssube in #2597
- [2064]: Add Karras to DPMSolverMultistepScheduler by @nipunjindal in #3001
- Finish docs textual inversion by @patrickvonplaten in #3068
- [Docs] refactor text-to-video zero by @sayakpaul in #3049
- Update Flax TPU tests by @pcuenca in #3069
- Fix a bug of pano when not doing CFG by @ernestchu in #3030
- Text2video zero refinements by @19and99 in #3070
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @Abhinay1997
- [Community Pipeline] Unclip Image Interpolation (#2400)
- @ssube
- @haofanwang
- @isamu-isozaki
- Added multitoken training for textual inversion. Issue 369 (#661)
- @mikegarts
- @LolitaSian
- add translated docs (#2587)
- @Ttl
- Controlnet training (#2545)
- @nipunjindal
- [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
- [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779)
- [2761]: Add documentation for extra_in_channels UNet1DModel (#2817)
- [2884]: Fix cross_attention_kwargs in StableDiffusionImg2ImgPipeline (#2902)
- [2905]: Add Karras pattern to discrete euler (#2956)
- [2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
- @bahjat-kawar
- @piEsposito
- add load textual inversion embeddings to stable diffusion (#2009)
- @19and99
- @MuhHanif
- Flax memory efficient attention (#2889)