aMUSEd
aMUSEd is a lightweight text to image model based off of the MUSE architecture. aMUSEd is particularly useful in applications that require a lightweight and fast model, such as generating many images quickly at once. aMUSEd is currently a research release.
aMUSEd is a VQVAE token-based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with MUSE, it uses the smaller text encoder CLIP-L/14 instead of T5-XXL. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
Text-to-image generation
import torch
from diffusers import AmusedPipeline
pipe = AmusedPipeline.from_pretrained(
"huggingface/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe.vqvae.to(torch.float32) # vqvae is producing nans in fp16
pipe = pipe.to("cuda")
prompt = "cowboy"
image = pipe(prompt, generator=torch.manual_seed(8)).images[0]
image.save("text2image_512.png")
Image-to-image generation
import torch
from diffusers import AmusedImg2ImgPipeline
from diffusers.utils import load_image
pipe = AmusedImg2ImgPipeline.from_pretrained(
"huggingface/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe.vqvae.to(torch.float32) # vqvae is producing nans in fp16
pipe = pipe.to("cuda")
prompt = "apple watercolor"
input_image = (
load_image(
"https://raw.githubusercontent.com/huggingface/amused/main/assets/image2image_256_orig.png"
)
.resize((512, 512))
.convert("RGB")
)
image = pipe(prompt, input_image, strength=0.7, generator=torch.manual_seed(3)).images[0]
image.save("image2image_512.png")
Inpainting
import torch
from diffusers import AmusedInpaintPipeline
from diffusers.utils import load_image
from PIL import Image
pipe = AmusedInpaintPipeline.from_pretrained(
"huggingface/amused-512", variant="fp16", torch_dtype=torch.float16
)
pipe.vqvae.to(torch.float32) # vqvae is producing nans in fp16
pipe = pipe.to("cuda")
prompt = "a man with glasses"
input_image = (
load_image(
"https://raw.githubusercontent.com/huggingface/amused/main/assets/inpainting_256_orig.png"
)
.resize((512, 512))
.convert("RGB")
)
mask = (
load_image(
"https://raw.githubusercontent.com/huggingface/amused/main/assets/inpainting_256_mask.png"
)
.resize((512, 512))
.convert("L")
)
image = pipe(prompt, input_image, mask, generator=torch.manual_seed(3)).images[0]
image.save(f"inpainting_512.png")
📜 Docs: https://huggingface.co/docs/diffusers/main/en/api/pipelines/amused
🛠️ Models:
mused-256
: https://huggingface.co/huggingface/amused-256 (603M params)amused-512
: https://huggingface.co/huggingface/amused-512 (608M params)
3x faster SDXL
We’re excited to present an array of optimization techniques that can be used to accelerate the inference latency of text-to-image diffusion models. All of these can be done in native PyTorch without requiring additional C++ code.
These techniques are not specific to Stable Diffusion XL (SDXL) and can be used to improve other text-to-image diffusion models too. We encourage you to check out the detailed docs provided below.
📜 Docs: https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion
Interruptible pipelines
Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.
This callback function should take the following arguments: pipe
, i
, t
, and callback_kwargs
(this must be returned). Set the pipeline's _interrupt
attribute to True
to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.
In this example, the diffusion process is stopped after 10 steps even though num_inference_steps
is set to 50.
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.enable_model_cpu_offload()
num_inference_steps = 50
def interrupt_callback(pipe, i, t, callback_kwargs):
stop_idx = 10
if i == stop_idx:
pipe._interrupt = True
return callback_kwargs
pipe(
"A photo of a cat",
num_inference_steps=num_inference_steps,
callback_on_step_end=interrupt_callback,
)
📜 Docs: https://huggingface.co/docs/diffusers/main/en/using-diffusers/callback
peft
in our LoRA training examples
We incorporated peft
in all the officially supported training examples concerning LoRA. This greatly simplifies the code and improves readability. LoRA training hasn't been easier, thanks to peft
!
More memory-friendly version of LCM LoRA SDXL training
We incorporated best practices from peft
to make LCM LoRA training for SDXL more memory-friendly. As such, you don't have to initialize two UNets (teacher and student) anymore. This version also integrates with the datasets
library for quick experimentation. Check out this section for more details.
- [docs] Fix video link by @stevhliu in #5986
- Fix LLMGroundedDiffusionPipeline super class arguments by @KristianMischke in #5993
- Remove a duplicated line? by @sweetcocoa in #6010
- [examples/advanced_diffusion_training] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script by @linoytsaban in #5935
- [advanced_dreambooth_lora_sdxl_tranining_script] readme fix by @linoytsaban in #6019
- [docs] Fix SVD video by @stevhliu in #6004
- [Easy] minor edits to setup.py by @sayakpaul in #5996
- [From Single File] Allow Text Encoder to be passed by @patrickvonplaten in #6020
- [Community Pipeline] Regional Prompting Pipeline by @hako-mikan in #6015
- [
logging
] Fix assertion bug by @standardAI in #6012 - [
Docs
] Update a link by @standardAI in #6014 - added attention_head_dim, attention_type, resolution_idx by @charchit7 in #6011
- fix style by @patrickvonplaten (direct commit on v0.25.0)
- [Kandinsky 3.0] Follow-up TODOs by @yiyixuxu in #5944
- [schedulers] create
self.sigmas
during init by @yiyixuxu in #6006 - Post Release: v0.24.0 by @patrickvonplaten in #5985
- LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft by @TonyLianLong in #6023
- adapt PixArtAlphaPipeline for pixart-lcm model by @lawrence-cj in #5974
- [PixArt Tests] remove fast tests from slow suite by @sayakpaul in #5945
- [LoRA serialization] fix: duplicate unet prefix problem. by @sayakpaul in #5991
- [advanced dreambooth lora sdxl training script] improve help tags by @linoytsaban in #6035
- fix StableDiffusionTensorRT super args error by @gujingit in #6009
- Update value_guided_sampling.py by @Parth38 in #6027
- Update Tests Fetcher by @DN6 in #5950
- Add variant argument to dreambooth lora sdxl advanced by @levi in #6021
- [Feature] Support IP-Adapter Plus by @okotaku in #5915
- [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ by @RuoyiDu in #6022
- [advanced dreambooth lora training script][bug_fix] change token_abstraction type to str by @linoytsaban in #6040
- [docs] Add Kandinsky 3 by @stevhliu in #5988
- [docs]
#Copied from
mechanism by @stevhliu in #6007 - Move kandinsky convert script by @DN6 in #6047
- Pin Ruff Version by @DN6 in #6059
- Ldm unet convert fix by @DN6 in #6038
- Fix demofusion by @radames in #6049
- [From single file] remove depr warning by @patrickvonplaten in #6043
- [advanced_dreambooth_lora_sdxl_tranining_script] save embeddings locally fix by @apolinario in #6058
- Device agnostic testing by @arsalanu in #5612
- [feat] allow SDXL pipeline to run with fused QKV projections by @sayakpaul in #6030
- fix by @DN6 (direct commit on v0.25.0)
- Use CC12M for LCM WDS training example by @pcuenca in #5908
- Disable Tests Fetcher by @DN6 in #6060
- [Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step by @apolinario in #6076
- [Euler Discrete] Fix sigma by @patrickvonplaten in #6078
- Harmonize HF environment variables + deprecate use_auth_token by @Wauplin in #6066
- [docs] SDXL Turbo by @stevhliu in #6065
- Add ControlNet-XS support by @UmerHA in #5827
- Fix typing inconsistency in Euler discrete scheduler by @iabaldwin in #6052
- [
PEFT
] Adapt example scripts to use PEFT by @younesbelkada in #5388 - Fix clearing backend cache from device agnostic testing by @DN6 in #6075
- [Community] AnimateDiff + Controlnet Pipeline by @a-r-r-o-w in #5928
- EulerDiscreteScheduler add
rescale_betas_zero_snr
by @Beinsezii in #6024 - Add support for IPAdapterFull by @fabiorigano in #5911
- Fix a bug in
add_noise
function by @yiyixuxu in #6085 - [Advanced Diffusion Script] Add Widget default text by @apolinario in #6100
- [Advanced Training Script] Fix pipe example by @apolinario in #6106
- IP-Adapter for StableDiffusionControlNetImg2ImgPipeline by @charchit7 in #5901
- IP adapter support for most pipelines by @a-r-r-o-w in #5900
- Correct type annotation for
VaeImageProcessor.numpy_to_pil
by @edwardwli in #6111 - [
Docs
] Fix typos by @standardAI in #6122 - [feat: Benchmarking Workflow] add stuff for a benchmarking workflow by @sayakpaul in #5839
- [Community] Add SDE Drag pipeline by @Monohydroxides in #6105
- [docs] IP-Adapter API doc by @stevhliu in #6140
- Add missing subclass docs, Fix broken example in SD_safe by @a-r-r-o-w in #6116
- [advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used by @linoytsaban in #6171
- [docs] Add missing
\
in lora.md by @pierd in #6174 - [Sigmas] Keep sigmas on CPU by @patrickvonplaten in #6173
- LoRA test fixes by @DN6 in #6163
- Add PEFT to training deps by @DN6 in #6148
- Clean Up Comments in LCM(-LoRA) Distillation Scripts. by @dg845 in #6145
- Compile test fix by @DN6 in #6104
- [LoRA] add an error message when dealing with _best_guess_weight_name ofline by @sayakpaul in #6184
- [Core] feat: enable fused attention projections for other SD and SDXL pipelines by @sayakpaul in #6179
- [Benchmarks] fix: lcm benchmarking reporting by @sayakpaul in #6198
- [Refactor autoencoders] feat: introduce autoencoders module by @sayakpaul in #6129
- Fix the test script in examples/text_to_image/README.md by @krahets in #6209
- Nit fix to training params by @osanseviero in #6200
- [Training] remove depcreated method from lora scripts. by @sayakpaul in #6207
- Fix SDXL Inpainting from single file with Refiner Model by @DN6 in #6147
- Fix possible re-conversion issues after extracting from safetensors by @d8ahazard in #6097
- Fix t2i. blog url by @abinthomasonline in #6205
- [Text-to-Video] Clean up pipeline by @patrickvonplaten in #6213
- [Torch Compile] Fix torch compile for svd vae by @patrickvonplaten in #6217
- Deprecate Pipelines by @DN6 in #6169
- Update README.md by @TilmannR in #6191
- Support img2img and inpaint in lpw-xl by @a-r-r-o-w in #6114
- Update train_text_to_image_lora.py by @haofanwang in #6144
- [SVD] Fix guidance scale by @patrickvonplaten in #6002
- Slow Test for Pipelines minor fixes by @DN6 in #6221
- Add converter method for ip adapters by @fabiorigano in #6150
- offload the optional module
image_encoder
by @yiyixuxu in #6151 - fix: init for vae during pixart tests by @sayakpaul in #6215
- [T2I LoRA training] fix: unscale fp16 gradient problem by @sayakpaul in #6119
- ControlNetXS fixes. by @DN6 in #6228
- add peft dependency to fast push tests by @sayakpaul in #6229
- [refactor embeddings]pixart-alpha by @yiyixuxu in #6212
- [Docs] Fix a code example in the ControlNet Inpainting documentation by @raven38 in #6236
- [docs] Batched seeds by @stevhliu in #6237
- [Fix] Fix Regional Prompting Pipeline by @hako-mikan in #6188
- EulerAncestral add
rescale_betas_zero_snr
by @Beinsezii in #6187 - [Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. by @sayakpaul in #6128
- Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill by @dependabot[bot] in #6255
- fix: unscale fp16 gradient problem & potential error by @lvzii in #6086)
- [Refactor] move diffedit out of stable_diffusion by @sayakpaul in #6260
- move attend and excite out of stable_diffusion by @sayakpaul (direct commit on v0.25.0)
- Revert "move attend and excite out of stable_diffusion" by @sayakpaul (direct commit on v0.25.0)
- [Training] remove depcreated method from lora scripts again by @Yimi81 in #6266
- [Refactor] move k diffusion out of stable_diffusion by @sayakpaul in #6267
- [Refactor] move gligen out of stable diffusion. by @sayakpaul in #6265
- [Refactor] move sag out of
stable_diffusion
by @sayakpaul in #6264 - TST Fix LoRA test that fails with PEFT >= 0.7.0 by @BenjaminBossan in #6216
- [Refactor] move attend and excite out of
stable_diffusion
. by @sayakpaul in #6261 - [Refactor] move panorama out of
stable_diffusion
by @sayakpaul in #6262 - [Deprecated pipelines] remove pix2pix zero from init by @sayakpaul in #6268
- [Refactor] move ldm3d out of stable_diffusion. by @sayakpaul in #6263
- open muse by @williamberman in #5437
- Remove ONNX inpaint legacy by @DN6 in #6269
- Remove peft tests from old lora backend tests by @DN6 in #6273
- Allow diffusers to load with Flax, w/o PyTorch by @pcuenca in #6272
- [Community Pipeline] Add Marigold Monocular Depth Estimation by @markkua in #6249
- Fix Prodigy optimizer in SDXL Dreambooth script by @apolinario in #6290
- [LoRA PEFT] fix LoRA loading so that correct alphas are parsed by @sayakpaul in #6225
- LoRA Unfusion test fix by @DN6 in #6291
- Fix typos in the
ValueError
for a nested image list asStableDiffusionControlNetPipeline
input. by @celestialphineas in #6286 - fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in train_text_to_image_lora.py by @mwkldeveloper in #6259
- fix: t2i apdater paper link by @sayakpaul in #6314
- fix: lora peft dummy components by @sayakpaul in #6308
- [Tests] Speed up example tests by @sayakpaul in #6319
- fix: cannot set guidance_scale by @Jannchie in #6326
- Change LCM-LoRA README Script Example Learning Rates to 1e-4 by @dg845 in #6304
- [Peft] fix saving / loading when unet is not "unet" by @kashif in #6046
- [Wuerstchen] fix fp16 training and correct lora args by @kashif in #6245
- [docs] fix: animatediff docs by @sayakpaul in #6339
- [Training] Add
datasets
version of LCM LoRA SDXL by @sayakpaul in #5778 - [
Peft
/Lora
] Addadapter_names
infuse_lora
by @younesbelkada in #5823 - [Diffusion fast] add doc for diffusion fast by @sayakpaul in #6311
- Add rescale_betas_zero_snr Argument to DDPMScheduler by @dg845 in #6305
- Interruptable Pipelines by @DN6 in #5867
- Update Animatediff docs by @DN6 in #6341
- Add AnimateDiff conversion scripts by @DN6 in #6340
- amused other pipelines docs by @williamberman in #6343
- [Docs] fix: video rendering on svd. by @sayakpaul in #6330
- [SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run by @priprapre in #6270
- adding auto1111 features to inpainting pipeline by @yiyixuxu in #6072
- Remove unused parameters and fixed
FutureWarning
by @Justin900429 in #6317 - amused update links to new repo by @williamberman in #6344
- [LoRA] make LoRAs trained with
peft
loadable whenpeft
isn't installed by @sayakpaul in #6306 - Move ControlNetXS into Community Folder by @DN6 in #6316
- fix: use retrieve_latents by @Jannchie in #6337
- Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. by @dg845 in #6279
- Fix "push_to_hub only create repo in consistency model lora SDXL training script" by @aandyw in #6102
- Fix chunking in SVD by @DN6 in #6350
- Add PEFT to advanced training script by @apolinario in #6294
- Release: v0.25.0 by @sayakpaul (direct commit on v0.25.0)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @hako-mikan
- @TonyLianLong
- LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
- @okotaku
- [Feature] Support IP-Adapter Plus (#5915)
- @RuoyiDu
- [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
- @UmerHA
- Add ControlNet-XS support (#5827)
- @a-r-r-o-w
- @Monohydroxides
- [Community] Add SDE Drag pipeline (#6105)
- @dg845
- @markkua
- [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)
...