SDXL 1.0
Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers
.
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image
Many additional cool features are released:
- Pipelines for
- Img2Img
- Inpainting
- Torch compile support
- Model offloading
- Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter
Refer to the documentation to know more.
New training scripts for SDXL
When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:
Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.
New pipelines for SDXL
The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:
StableDiffusionXLControlNetPipeline
StableDiffusionXLInstructPix2PixPipeline
The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.
Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline
in #4079.
The AutoPipeline API
We now support Auto
APIs for the following tasks: text-to-image, image-to-image, and inpainting:
Here is how to use one:
from diffusers import AutoPipelineForTextToImage
import torch
pipe_t2i = AutoPipelineForText2Image.from_pretrained(
"runwayml/stable-diffusion-v1-5", requires_safety_checker=False, torch_dtype=torch.float16
).to("cuda")
prompt = "photo a majestic sunrise in the mountains, best quality, 4k"
image = pipe_t2i(prompt).images[0]
image.save("image.png")
Without any extra memory, you can then switch to Image-to-Image
from diffusers import AutoPipelineForImageToImage
pipe_i2i = AutoPipelineForImageToImage.from_pipe(pipe_t2i)
image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0]
image.save("image.png")
Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.
Refer to the documentation to know more.
A new “combined pipeline” for the Kandinsky series
We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:
from diffusers import AutoPipelineForTextToImage
import torch
pipe = AutoPipelineForTextToImage.from_pretrained(
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()
prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0]
image.save("image.png")
The following pipelines, which can be accessed via the "Auto" pipelines were added:
- KandinskyCombinedPipeline
- KandinskyImg2ImgCombinedPipeline
- KandinskyInpaintCombinedPipeline
- KandinskyV22CombinedPipeline
- KandinskyV22Img2ImgCombinedPipeline
- KandinskyV22InpaintCombinedPipeline
To know more, check out the following pages:
🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨
NOW: mask_image
repaints white pixels and preserves black pixels.
Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.
Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:
# For PIL input
import PIL.ImageOps
mask = PIL.ImageOps.invert(mask)
# For PyTorch and Numpy input
mask = 1 - mask
Asymmetric VQGAN
Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:
from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline
def download_image(url: str) -> Image.Image:
response = requests.get(url)
return Image.open(BytesIO(response.content)).convert("RGB")
prompt = "a photo of a person"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))
pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")
image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")
Refer to the documentation to know more.
Thanks to @cross-attention for contributing this model in #3956.
Improved support for loading Kohya-style LoRA checkpoints
We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers
. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers
. Users can expect further improvements in the upcoming releases.
Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.
T2I Adapter
pip install matplotlib
from PIL import Image
import torch
import numpy as np
import matplotlib
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline
def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
"""Converts a depth map to a color image.
Args:
value (torch.Tensor, numpy.ndarry): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
vmax (float, optional): vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
value_transform (Callable, optional): Apply transform function to valid pixels before coloring. Defaults to None.
Returns:
numpy.ndarray, dtype - uint8: Colored depth map. Shape: (H, W, 4)
"""
if isinstance(value, torch.Tensor):
value = value.detach().cpu().numpy()
value = value.squeeze()
if invalid_mask is None:
invalid_mask = value == invalid_val
mask = np.logical_not(invalid_mask)
# normalize
vmin = np.percentile(value[mask],2) if vmin is None else vmin
vmax = np.percentile(value[mask],85) if vmax is None else vmax
if vmin != vmax:
value = (value - vmin) / (vmax - vmin) # vmin..vmax
else:
# Avoid 0-division
value = value * 0.
# squeeze last dim if it exists
# grey out the invalid values
value[invalid_mask] = np.nan
cmapper = matplotlib.cm.get_cmap(cmap)
if value_transform:
value = value_transform(value)
# value = value / value.max()
value = cmapper(value, bytes=True) # (nxmx4)
img = value[...]
img[invalid_mask] = background_color
if gamma_corrected:
img = img / 255
img = np.power(img, 2.2)
img = img * 255
img = img.astype(np.uint8)
return img
model = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)
img = Image.open('./images/zoedepth_in.png')
out = model.infer_pil(img)
zoedepth_image = Image.fromarray(colorize(out)).convert('RGB')
zoedepth_image.save('images/zoedepth.png')
adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_zoedepth_sd15v1", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)
pipe.to('cuda')
zoedepth_image_out = pipe(prompt="motorcycle", image=zoedepth_image).images[0]
zoedepth_image_out.save('images/zoedepth_out.png')
All commits
- 📝 Fix broken link to models documentation by @kadirnar in #4026
- move to 0.19.0dev by @patrickvonplaten in #4048
- [SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines by @bghira in #4015
- Correct sdxl docs by @patrickvonplaten in #4058
- Add circular padding for artifact-free StableDiffusionPanoramaPipeline by @EvgenyKashin in #4025
- Update train_unconditional.py by @hjmnbnb in #3899
- Trigger CI on ci-* branches by @Wauplin in #3635
- Fix kandinsky remove safety by @patrickvonplaten in #4065
- Multiply lr scheduler steps by
num_processes
. by @eliphatfs in #3983 - [Community] Implementation of the IADB community pipeline by @tchambon in #3996
- add kandinsky to readme table by @yiyixuxu in #4081
- [From Single File] Force accelerate to be installed by @patrickvonplaten in #4078
- fix requirement in SDXL by @killah-t-cell in #4082
- fix: minor things in the SDXL docs. by @sayakpaul in #4070
- [Invisible watermark] Correct version by @patrickvonplaten in #4087
- [Feat] add: utility for unloading lora. by @sayakpaul in #4034
- [tests] use parent class for monkey patching to not break other tests by @patrickvonplaten in #4088
- Allow low precision vae sd xl by @patrickvonplaten in #4083
- [SD-XL] Add inpainting by @patrickvonplaten in #4098
- [Stable Diffusion Inpaint ]Fix dtype inpaint by @patrickvonplaten in #4113
- [From ckpt] replace with os path join by @patrickvonplaten in #3746
- [From single file] Make accelerate optional by @patrickvonplaten in #4132
- add
noise_sampler_seed
toStableDiffusionKDiffusionPipeline.__call__
by @sunhs in #3911 - Make setup.py compatible with pipenv by @apoorvaeternity in #4121
- 📝 Update doc with more descriptive title and filename for "IF" section by @kadirnar in #4049
- t2i pipeline by @williamberman in #3932
- [Docs] Korean translation update by @Snailpong in #4022
- [Enhance] Add rank in dreambooth by @okotaku in #4112
- Refactor execution device & cpu offload by @patrickvonplaten in #4114
- Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler by @clarencechen in #3865
- [Core] add: controlnet support for SDXL by @sayakpaul in #4038
- Docs/bentoml integration by @larme in #4090
- Fixed SDXL single file loading to use the correct requested pipeline class by @Mystfit in #4142
- feat: add
act_fn
param toOutValueFunctionBlock
by @SauravMaheshkar in #3994 - Add controlnet and vae from single file by @patrickvonplaten in #4084
- fix incorrect attention head dimension in AttnProcessor2_0 by @zhvng in #4154
- Fix bug in ControlNetPipelines with MultiControlNetModel of length 1 by @greentfrapp in #4032
- Asymmetric vqgan by @cross-attention in #3956
- Shap-E: add support for mesh output by @yiyixuxu in #4062
- [From single file] Make sure that controlnet stays False for from_single_file by @patrickvonplaten in #4181
- [ControlNet Training] Remove safety from controlnet by @patrickvonplaten in #4180
- remove bentoml doc in favor of blogpost by @williamberman in #4182
- Fix unloading of LoRAs when xformers attention procs are in use by @isidentical in #4179
- [Safetensors] make safetensors a required dep by @patrickvonplaten in #4177
- make enable_sequential_cpu_offload more generic for third-party devices by @statelesshz in #4191
- Allow passing different prompts to each
text_encoder
onstable_diffusion_xl
pipelines by @apolinario in #4156 - [SDXL ControlNet Training] Follow-up fixes by @sayakpaul in #4188
- 📄 Renamed File for Better Understanding by @kadirnar in #4056
- [docs] Clean up pipeline apis by @stevhliu in #3905
- docs: Typo in dreambooth example README.md by @askulkarni2 in #4203
- [fix]
network_alpha
when loading unet lora from old format by @Jackmin801 in #4221 - fix no CFG for kandinsky pipelines by @yiyixuxu in #4193
- fix a bug of
prompt embeds
in sdxl by @xiaohu2015 in #4099 - Raise initial HTTPError if pipeline is not cached locally by @Wauplin in #4230
- [SDXL] Fix sd xl encode prompt by @patrickvonplaten in #4237
- [SD-XL] Fix sdxl controlnet inference by @patrickvonplaten in #4238
- [docs] Changed path for ControlNet in docs by @rcmtcristian in #4215
- Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers by @AmericanPresidentJimmyCarter in #4115
- [docs] Other modalities by @stevhliu in #4205
- docs: Add missing import statement in textual_inversion inference example by @askulkarni2 in #4227
- [Docs] Fix from pretrained docs by @patrickvonplaten in #4240
- [ControlNet SDXL training] fixes in the training script by @sayakpaul in #4223
- [SDXL DreamBooth LoRA] add support for text encoder fine-tuning by @sayakpaul in #4097
- Resolve bf16 error as mentioned in this issue by @nupurkmr9 in #4214
- do not pass list to accelerator.init_trackers by @williamberman in #4248
- [From Single File] Allow vae to be loaded by @patrickvonplaten in #4242
- [SDXL] Improve docs by @patrickvonplaten in #4196
- [draft v2] AutoPipeline by @yiyixuxu in #4138
- Update README_sdxl.md to change the note on default hyperparameters by @sayakpaul in #4258
- [from_single_file] Fix circular import by @patrickvonplaten in #4259
- Model path for sdxl wrong in dreambooth README by @rrva in #4261
- [SDXL and IP2P]: instruction pix2pix XL training and pipeline by @harutatsuakiyama in #4079
- [docs] Fix image in SDXL docs by @stevhliu in #4267
- [SDXL DreamBooth LoRA] multiple fixes by @sayakpaul in #4262
- Load Kohya-ss style LoRAs with auxilary states by @isidentical in #4147
- Fix all missing optional import statements from pipeline folders by @patrickvonplaten in #4272
- [Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by @patrickvonplaten in #4207
- Where did this 'x' come from, Elon? by @camenduru in #4277
- add openvino and onnx runtime SD XL documentation by @echarlaix in #4285
- Rename by @patrickvonplaten in #4294
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @Snailpong
- [Docs] Korean translation update (#4022)
- @clarencechen
- Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler (#3865)
- @cross-attention
- Asymmetric vqgan (#3956)
- @AmericanPresidentJimmyCarter
- Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers (#4115)
- @harutatsuakiyama
- [SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)