Würstchen
Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.
Here is how to use the Würstchen as a pipeline:
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
pipeline = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("cuda")
caption = "Anthropomorphic cat dressed as a firefighter"
images = pipeline(
caption,
height=1024,
width=1536,
prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
prior_guidance_scale=4.0,
num_images_per_prompt=4,
).images
To learn more about the pipeline, check out the official documentation.
This pipeline was contributed by one of the authors of Würstchen, @dome272, with help from @kashif and @patrickvonplaten.
👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen
T2I Adapters for Stable Diffusion XL (SDXL)
T2I-Adapter is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.
In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.
Below is an how to use the StableDiffusionXLAdapterPipeline
.
First ensure, the controlnet_aux
is installed:
pip install -U controlnet_aux==0.0.7
Then we can initialize the pipeline:
import torch
from controlnet_aux.lineart import LineartDetector
from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler,
StableDiffusionXLAdapterPipeline, T2IAdapter)
from diffusers.utils import load_image, make_image_grid
# load adapter
adapter = T2IAdapter.from_pretrained(
"TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")
# load pipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(
model_id, subfolder="scheduler"
)
vae = AutoencoderKL.from_pretrained(
"madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
model_id,
vae=vae,
adapter=adapter,
scheduler=euler_a,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
# load lineart detector
line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
We then load an image to compute the lineart conditionings:
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(image, detect_resolution=384, image_resolution=1024)
Then we generate:
prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=image,
num_inference_steps=30,
adapter_conditioning_scale=0.8,
guidance_scale=7.5,
).images[0]
Refer to the official documentation to learn more about StableDiffusionXLAdapterPipeline
.
This blog post summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.
We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the documentation to learn more.
Thanks to @MC-E (one of the authors of T2I Adapters) for contributing the StableDiffusionXLAdapterPipeline
in #4696.
Faster imports
We introduced “lazy imports” (#4829) to significantly improve the time it takes to import our modules (such as pipelines
, models
, and so on). Below is a comparison of the timings with and without lazy imports on import diffusers
.
With lazy imports:
real 0m0.417s
user 0m0.714s
sys 0m0.499s
Without lazy imports:
real 0m5.391s
user 0m5.299s
sys 0m1.273s
Faster LoRA loading
Previously, loading LoRA parameters using the load_lora_weights()
used to be time-consuming as reported in #4975. To this end, we introduced a low_cpu_mem_usage
argument to the load_lora_weights()
method in #4994 which should speed up the loading time significantly. Just pass low_cpu_mem_usage=True
to take the benefits.
LoRA fusing
LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.
For more information, have a look at the documentation and the original PR: #4473.
More support for LoRAs
Almost all LoRA formats out there for SDXL are now supported. For a more details, please check the documentation.
All commits
- fix: lora sdxl tests by @sayakpaul in #4652
- Support tiled encode/decode for
AutoencoderTiny
by @Isotr0py in #4627 - Add SDXL long weighted prompt pipeline (replace pr:4629) by @xhinker in #4661
- add config_file to from_single_file by @zuojianghua in #4614
- Add AudioLDM 2 by @sanchit-gandhi in #4549
- [docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by @dg845 in #4703
- [Core] enable lora for sdxl controlnets too and add slow tests. by @sayakpaul in #4666
- [LoRA] ensure different LoRA ranks for text encoders can be properly handled by @sayakpaul in #4669
- [LoRA] default to None when fc alphas are not available. by @sayakpaul in #4706
- Replaces
DIFFUSERS_TEST_DEVICE
backend list with trying device by @vvvm23 in #4673 - add convert diffuser pipeline of XL to original stable diffusion by @realliujiaxu in #4596
- Add reference_attn & reference_adain support for sdxl by @zideliu in #4502
- [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
- rename test file to run, so that examples tests do not fail by @patrickvonplaten in #4715
- Revert "Move controlnet load local tests to nightly by @patrickvonplaten in #4543)"
- Fix all docs by @patrickvonplaten in #4721
- fix bad error message when transformers is missing by @patrickvonplaten in #4714
- Fix AutoencoderTiny encoder scaling convention by @madebyollin in #4682
- [Examples] fix checkpointing and casting bugs in
train_text_to_image_lora_sdxl.py
by @sayakpaul in #4632 - [AudioLDM Docs] Fix docs for output by @sanchit-gandhi in #4737
- [docs] add variant="fp16" flag by @realliujiaxu in #4678
- [AudioLDM Docs] Update docstring by @sanchit-gandhi in #4744
- fix dummy import for AudioLDM2 by @patil-suraj in #4741
- change validation scheduler for train_dreambooth.py when training IF by @wyz894272237 in #4333
- add a step_index counter by @yiyixuxu in #4347
- [AudioLDM2] Doc fixes by @sanchit-gandhi in #4739
- Bugfix for SDXL model loading in low ram system. by @Symbiomatrix in #4628
- Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by @DN6 in #4759
- [Tests] Fix paint by example by @patrickvonplaten in #4761
- [fix] multi t2i adapter set total_downscale_factor by @williamberman in #4621
- [Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by @mnslarcher in #4762
- [LoRA] relax lora loading logic by @sayakpaul in #4610
- [Examples] fix sdxl dreambooth lora checkpointing. by @sayakpaul in #4749
- fix sdxl_lwp empty neg_prompt error issue by @xhinker in #4743
- improve setup.py by @sayakpaul in #4748
- Torch device by @patrickvonplaten in #4755
- [AudioLDM 2] Pipeline fixes by @sanchit-gandhi in #4738
- Convert MusicLDM by @sanchit-gandhi in #4579
- [WIP ] Proposal to address precision issues in CI by @DN6 in #4775
- fix a bug in
from_pretrained
when load optional components by @yiyixuxu in #4745 - fix bug of progress bar in clip guided images mixing by @scnuhealthy in #4729
- Fixed broken link of CLIP doc in evaluation doc by @mayank2 in #4760
- instance_prompt->class_prompt by @williamberman in #4784
- refactor prepare_mask_and_masked_image with VaeImageProcessor by @yiyixuxu in #4444
- Allow passing a checkpoint state_dict to convert_from_ckpt (instead of just a string path) by @cmdr2 in #4653
- [SDXL] Add docs about forcing passed embeddings to be 0 by @patrickvonplaten in #4783
- [Core] Support negative conditions in SDXL by @sayakpaul in #4774
- Unet fix by @canberk17 in #4769
- [Tests] Tighten up LoRA loading relaxation by @sayakpaul in #4787
- [docs] Fix syntax for compel by @stevhliu in #4794
- [Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
- [SDXL Lora] Fix last ben sdxl lora by @patrickvonplaten in #4797
- [LoRA Attn Processors] Refactor LoRA Attn Processors by @patrickvonplaten in #4765
- Update loaders.py by @chillpixelfun in #4805
- [WIP] Add Fabric by @shauray8 in #4201
- Fix save_path bug in textual inversion training script by @Yead in #4710
- [Examples] Save SDXL LoRA weights with chosen precision by @mnslarcher in #4791
- Fix Disentangle ONNX and non-ONNX pipeline by @DN6 in #4656
- fix bug in StableDiffusionXLControlNetPipeline when use guess_mode by @yiyixuxu in #4799
- fix auto_pipeline: pass kwargs to load_config by @yiyixuxu in #4793
- add StableDiffusionXLControlNetImg2ImgPipeline by @yiyixuxu in #4592
- add models for T2I-Adapter-XL by @MC-E in #4696
- Fuse loras by @patrickvonplaten in #4473
- Fix convert_original_stable_diffusion_to_diffusers script by @wingrime in #4817
- Support saving multiple t2i adapter models under one checkpoint by @VitjanZ in #4798
- fix typo by @zideliu in #4822
- VaeImageProcessor: Allow image resizing also for torch and numpy inputs by @gajendr-nikhil in #4832
- [Core] refactor encode_prompt by @sayakpaul in #4617
- Add loading ckpt from file for SDXL controlNet by @antigp in #4683
- Fix Unfuse Lora by @patrickvonplaten in #4833
- sketch inpaint from a1111 for non-inpaint models by @noskill in #4824
- [docs] SDXL by @stevhliu in #4428
- [Docs] improve the LoRA doc. by @sayakpaul in #4838
- Fix potential type mismatch errors in SDXL pipelines by @hyk1996 in #4796
- Fix image processor inputs width by @echarlaix in #4853
- Remove warn with deprecate by @patrickvonplaten in #4850
- [docs] ControlNet guide by @stevhliu in #4640
- [SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
- fix sdxl-inpaint fast test by @yiyixuxu in #4859
- [docs] Add inpainting example for forcing the unmasked area to remain unchanged to the docs by @dg845 in #4536
- Add GLIGEN Text Image implementation by @tuanh123789 in #4777
- Test Cleanup Precision issues by @DN6 in #4812
- Fix link from API to using-diffusers by @pcuenca in #4856
- [Docs] Korean translation update by @Snailpong in #4684
- fix a bug in sdxl-controlnet-img2img when using MultiControlNetModel by @yiyixuxu in #4862
- support AutoPipeline.from_pipe between a pipeline and its ControlNet pipeline counterpart by @yiyixuxu in #4861
- [WIP] masked_latent_inputs for inpainting pipeline by @yiyixuxu in #4819
- [docs] DiffEdit guide by @stevhliu in #4722
- [docs] Shap-E guide by @stevhliu in #4700
- [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL by @harutatsuakiyama in #4694
- [Tests] Add combined pipeline tests by @patrickvonplaten in #4869
- Retrieval Augmented Diffusion Models by @isamu-isozaki in #3297
- check for unet_lora_layers in sdxl pipeline's save_lora_weights method by @ErwannMillon in #4821
- Fix get_dummy_inputs for Stable Diffusion Inpaint Tests by @dg845 in #4845
- allow passing components to connected pipelines when use the combined pipeline by @yiyixuxu in #4883
- [Core] LoRA improvements pt. 3 by @sayakpaul in #4842
- Add dropout parameter to UNet2DModel/UNet2DConditionModel by @dg845 in #4882
- [Core] better support offloading when side loading is enabled. by @sayakpaul in #4855
- Add --vae_precision option to the SDXL pix2pix script so that we have… by @bghira in #4881
- [Test] Reduce CPU memory by @patrickvonplaten in #4897
- fix a bug in StableDiffusionUpscalePipeline.run_safety_checker by @yiyixuxu in #4886
- remove latent input for kandinsky prior_emb2emb pipeline by @yiyixuxu in #4887
- [docs] Add stronger warning for SDXL height/width by @stevhliu in #4867
- [Docs] add doc entry to explain lora fusion and use of different scales. by @sayakpaul in #4893
- [Textual inversion] Relax loading textual inversion by @patrickvonplaten in #4903
- [docs] Fix typo in Inpainting force unmasked area unchanged example by @dg845 in #4910
- Würstchen model by @kashif in #3849
- [InstructPix2Pix] Fix pipeline implementation and add docs by @sayakpaul in #4844
- [StableDiffusionXLAdapterPipeline] add adapter_conditioning_factor by @patil-suraj in #4937
- [StableDiffusionXLAdapterPipeline] allow negative micro conds by @patil-suraj in #4941
- [examples] T2IAdapter training script by @patil-suraj in #4934
- [Tests] add: tests for t2i adapter training. by @sayakpaul in #4947
- guard save model hooks to only execute on main process by @williamberman in #4929
- [Docs] add t2i adapter entry to overview of training scripts. by @sayakpaul in #4946
- Temp Revert "[Core] better support offloading when side loading is enabled… by @williamberman in #4927
- Revert revert and install accelerate main by @williamberman in #4963
- [Docs] fix: minor formatting in the Würstchen docs by @sayakpaul in #4965
- Lazy Import for Diffusers by @DN6 in #4829
- [Core] Remove TF import checks by @patrickvonplaten in #4968
- Make sure Flax pipelines can be loaded into PyTorch by @patrickvonplaten in #4971
- Update README.md by @patrickvonplaten in #4973
- Wuerstchen fixes by @kashif in #4942
- Refactor model offload by @patrickvonplaten in #4514
- [Bug Fix] Should pass the dtype instead of torch_dtype by @zhiqiang-canva in #4917
- [Utils] Correct custom init sort by @patrickvonplaten in #4967
- remove extra gligen in import by @DN6 in #4987
- fix E721 Do not compare types, use
isinstance()
by @kashif in #4992 - [Wuerstchen] fix combined pipeline's num_images_per_prompt by @kashif in #4989
- fix image variation slow test by @DN6 in #4995
- fix custom diffusion tests by @DN6 in #4996
- [Lora] Speed up lora loading by @patrickvonplaten in #4994
- [docs] Fix DiffusionPipeline.enable_sequential_cpu_offload docstring by @dg845 in #4952
- Fix safety checker seq offload by @patrickvonplaten in #4998
- Fix PR template by @stevhliu in #4984
- examples fix t2i training by @patrickvonplaten in #5001
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @xhinker
- @zideliu
- @shauray8
- [WIP] Add Fabric (#4201)
- @MC-E
- add models for T2I-Adapter-XL (#4696)
- @tuanh123789
- Add GLIGEN Text Image implementation (#4777)
- @Snailpong
- [Docs] Korean translation update (#4684)
- @harutatsuakiyama
- [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL (#4694)
- @isamu-isozaki
- Retrieval Augmented Diffusion Models (#3297)