huggingface/diffusers v0.19.0 on GitHub

SDXL 1.0

Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]
image

Many additional cool features are released:

Pipelines for
- Img2Img
- Inpainting
Torch compile support
Model offloading
Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter

Refer to the documentation to know more.

New training scripts for SDXL

When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:

Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.

New pipelines for SDXL

The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:

StableDiffusionXLControlNetPipeline
StableDiffusionXLInstructPix2PixPipeline

The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.

Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline in #4079.

The AutoPipeline API

We now support Auto APIs for the following tasks: text-to-image, image-to-image, and inpainting:

Here is how to use one:

from diffusers import AutoPipelineForTextToImage
import torch

pipe_t2i = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", requires_safety_checker=False, torch_dtype=torch.float16
).to("cuda")

prompt = "photo a majestic sunrise in the mountains, best quality, 4k"
image = pipe_t2i(prompt).images[0]
image.save("image.png")

Without any extra memory, you can then switch to Image-to-Image

from diffusers import AutoPipelineForImageToImage

pipe_i2i = AutoPipelineForImageToImage.from_pipe(pipe_t2i)

image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0]
image.save("image.png")

Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.

Refer to the documentation to know more.

A new “combined pipeline” for the Kandinsky series

We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:

from diffusers import AutoPipelineForTextToImage
import torch

pipe = AutoPipelineForTextToImage.from_pretrained(
    "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
)
pipe.enable_model_cpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0] 
image.save("image.png")

The following pipelines, which can be accessed via the "Auto" pipelines were added:

To know more, check out the following pages:

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

NOW: mask_image repaints white pixels and preserves black pixels.

Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.

Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:

# For PIL input
import PIL.ImageOps
mask = PIL.ImageOps.invert(mask)

# For PyTorch and Numpy input
mask = 1 - mask

Asymmetric VQGAN

Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:

from io import BytesIO
from PIL import Image
import requests
from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline

def download_image(url: str) -> Image.Image:
    response = requests.get(url)
    return Image.open(BytesIO(response.content)).convert("RGB")

prompt = "a photo of a person"
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"

image = download_image(img_url).resize((256, 256))
mask_image = download_image(mask_url).resize((256, 256))

pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
pipe.vae = AsymmetricAutoencoderKL.from_pretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5")
pipe.to("cuda")

image = pipe(prompt=prompt, image=image, mask_image=mask_image).images[0]
image.save("image.jpeg")

Refer to the documentation to know more.

Thanks to @cross-attention for contributing this model in #3956.

Improved support for loading Kohya-style LoRA checkpoints

We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers. Users can expect further improvements in the upcoming releases.

Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.

T2I Adapter

pip install matplotlib

from PIL import Image
import torch
import numpy as np
import matplotlib
from diffusers import T2IAdapter, StableDiffusionAdapterPipeline

def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
    """Converts a depth map to a color image.

    Args:
        value (torch.Tensor, numpy.ndarry): Input depth map. Shape: (H, W) or (1, H, W) or (1, 1, H, W). All singular dimensions are squeezed
        vmin (float, optional): vmin-valued entries are mapped to start color of cmap. If None, value.min() is used. Defaults to None.
        vmax (float, optional):  vmax-valued entries are mapped to end color of cmap. If None, value.max() is used. Defaults to None.
        cmap (str, optional): matplotlib colormap to use. Defaults to 'magma_r'.
        invalid_val (int, optional): Specifies value of invalid pixels that should be colored as 'background_color'. Defaults to -99.
        invalid_mask (numpy.ndarray, optional): Boolean mask for invalid regions. Defaults to None.
        background_color (tuple[int], optional): 4-tuple RGB color to give to invalid pixels. Defaults to (128, 128, 128, 255).
        gamma_corrected (bool, optional): Apply gamma correction to colored image. Defaults to False.
        value_transform (Callable, optional): Apply transform function to valid pixels before coloring. Defaults to None.

    Returns:
        numpy.ndarray, dtype - uint8: Colored depth map. Shape: (H, W, 4)
    """
    if isinstance(value, torch.Tensor):
        value = value.detach().cpu().numpy()

    value = value.squeeze()
    if invalid_mask is None:
        invalid_mask = value == invalid_val
    mask = np.logical_not(invalid_mask)

    # normalize
    vmin = np.percentile(value[mask],2) if vmin is None else vmin
    vmax = np.percentile(value[mask],85) if vmax is None else vmax
    if vmin != vmax:
        value = (value - vmin) / (vmax - vmin)  # vmin..vmax
    else:
        # Avoid 0-division
        value = value * 0.

    # squeeze last dim if it exists
    # grey out the invalid values

    value[invalid_mask] = np.nan
    cmapper = matplotlib.cm.get_cmap(cmap)
    if value_transform:
        value = value_transform(value)
        # value = value / value.max()
    value = cmapper(value, bytes=True)  # (nxmx4)

    img = value[...]
    img[invalid_mask] = background_color

    if gamma_corrected:
        img = img / 255
        img = np.power(img, 2.2)
        img = img * 255
        img = img.astype(np.uint8)
    return img

model = torch.hub.load("isl-org/ZoeDepth", "ZoeD_N", pretrained=True)

img = Image.open('./images/zoedepth_in.png')

out = model.infer_pil(img)

zoedepth_image = Image.fromarray(colorize(out)).convert('RGB')

zoedepth_image.save('images/zoedepth.png')

adapter = T2IAdapter.from_pretrained("TencentARC/t2iadapter_zoedepth_sd15v1", torch_dtype=torch.float16)
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", adapter=adapter, safety_checker=None, torch_dtype=torch.float16, variant="fp16"
)

pipe.to('cuda')
zoedepth_image_out = pipe(prompt="motorcycle", image=zoedepth_image).images[0]

zoedepth_image_out.save('images/zoedepth_out.png')

All commits

📝 Fix broken link to models documentation by @kadirnar in #4026
move to 0.19.0dev by @patrickvonplaten in #4048
[SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines by @bghira in #4015
Correct sdxl docs by @patrickvonplaten in #4058
Add circular padding for artifact-free StableDiffusionPanoramaPipeline by @EvgenyKashin in #4025
Update train_unconditional.py by @hjmnbnb in #3899
Trigger CI on ci-* branches by @Wauplin in #3635
Fix kandinsky remove safety by @patrickvonplaten in #4065
Multiply lr scheduler steps by num_processes. by @eliphatfs in #3983
[Community] Implementation of the IADB community pipeline by @tchambon in #3996
add kandinsky to readme table by @yiyixuxu in #4081
[From Single File] Force accelerate to be installed by @patrickvonplaten in #4078
fix requirement in SDXL by @killah-t-cell in #4082
fix: minor things in the SDXL docs. by @sayakpaul in #4070
[Invisible watermark] Correct version by @patrickvonplaten in #4087
[Feat] add: utility for unloading lora. by @sayakpaul in #4034
[tests] use parent class for monkey patching to not break other tests by @patrickvonplaten in #4088
Allow low precision vae sd xl by @patrickvonplaten in #4083
[SD-XL] Add inpainting by @patrickvonplaten in #4098
[Stable Diffusion Inpaint ]Fix dtype inpaint by @patrickvonplaten in #4113
[From ckpt] replace with os path join by @patrickvonplaten in #3746
[From single file] Make accelerate optional by @patrickvonplaten in #4132
add noise_sampler_seed to StableDiffusionKDiffusionPipeline.__call__ by @sunhs in #3911
Make setup.py compatible with pipenv by @apoorvaeternity in #4121
📝 Update doc with more descriptive title and filename for "IF" section by @kadirnar in #4049
t2i pipeline by @williamberman in #3932
[Docs] Korean translation update by @Snailpong in #4022
[Enhance] Add rank in dreambooth by @okotaku in #4112
Refactor execution device & cpu offload by @patrickvonplaten in #4114
Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler by @clarencechen in #3865
[Core] add: controlnet support for SDXL by @sayakpaul in #4038
Docs/bentoml integration by @larme in #4090
Fixed SDXL single file loading to use the correct requested pipeline class by @Mystfit in #4142
feat: add act_fn param to OutValueFunctionBlock by @SauravMaheshkar in #3994
Add controlnet and vae from single file by @patrickvonplaten in #4084
fix incorrect attention head dimension in AttnProcessor2_0 by @zhvng in #4154
Fix bug in ControlNetPipelines with MultiControlNetModel of length 1 by @greentfrapp in #4032
Asymmetric vqgan by @cross-attention in #3956
Shap-E: add support for mesh output by @yiyixuxu in #4062
[From single file] Make sure that controlnet stays False for from_single_file by @patrickvonplaten in #4181
[ControlNet Training] Remove safety from controlnet by @patrickvonplaten in #4180
remove bentoml doc in favor of blogpost by @williamberman in #4182
Fix unloading of LoRAs when xformers attention procs are in use by @isidentical in #4179
[Safetensors] make safetensors a required dep by @patrickvonplaten in #4177
make enable_sequential_cpu_offload more generic for third-party devices by @statelesshz in #4191
Allow passing different prompts to each text_encoder on stable_diffusion_xl pipelines by @apolinario in #4156
[SDXL ControlNet Training] Follow-up fixes by @sayakpaul in #4188
📄 Renamed File for Better Understanding by @kadirnar in #4056
[docs] Clean up pipeline apis by @stevhliu in #3905
docs: Typo in dreambooth example README.md by @askulkarni2 in #4203
[fix] network_alpha when loading unet lora from old format by @Jackmin801 in #4221
fix no CFG for kandinsky pipelines by @yiyixuxu in #4193
fix a bug of prompt embeds in sdxl by @xiaohu2015 in #4099
Raise initial HTTPError if pipeline is not cached locally by @Wauplin in #4230
[SDXL] Fix sd xl encode prompt by @patrickvonplaten in #4237
[SD-XL] Fix sdxl controlnet inference by @patrickvonplaten in #4238
[docs] Changed path for ControlNet in docs by @rcmtcristian in #4215
Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers by @AmericanPresidentJimmyCarter in #4115
[docs] Other modalities by @stevhliu in #4205
docs: Add missing import statement in textual_inversion inference example by @askulkarni2 in #4227
[Docs] Fix from pretrained docs by @patrickvonplaten in #4240
[ControlNet SDXL training] fixes in the training script by @sayakpaul in #4223
[SDXL DreamBooth LoRA] add support for text encoder fine-tuning by @sayakpaul in #4097
Resolve bf16 error as mentioned in this issue by @nupurkmr9 in #4214
do not pass list to accelerator.init_trackers by @williamberman in #4248
[From Single File] Allow vae to be loaded by @patrickvonplaten in #4242
[SDXL] Improve docs by @patrickvonplaten in #4196
[draft v2] AutoPipeline by @yiyixuxu in #4138
Update README_sdxl.md to change the note on default hyperparameters by @sayakpaul in #4258
[from_single_file] Fix circular import by @patrickvonplaten in #4259
Model path for sdxl wrong in dreambooth README by @rrva in #4261
[SDXL and IP2P]: instruction pix2pix XL training and pipeline by @harutatsuakiyama in #4079
[docs] Fix image in SDXL docs by @stevhliu in #4267
[SDXL DreamBooth LoRA] multiple fixes by @sayakpaul in #4262
Load Kohya-ss style LoRAs with auxilary states by @isidentical in #4147
Fix all missing optional import statements from pipeline folders by @patrickvonplaten in #4272
[Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by @patrickvonplaten in #4207
Where did this 'x' come from, Elon? by @camenduru in #4277
add openvino and onnx runtime SD XL documentation by @echarlaix in #4285
Rename by @patrickvonplaten in #4294

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Snailpong
- [Docs] Korean translation update (#4022)
@clarencechen
- Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler (#3865)
@cross-attention
- Asymmetric vqgan (#3956)
@AmericanPresidentJimmyCarter
- Allow specifying denoising_start and denoising_end as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers (#4115)
@harutatsuakiyama
- [SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)

huggingface/diffusers v0.19.0 v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN, T2I Adapter on GitHub

SDXL 1.0

New training scripts for SDXL

New pipelines for SDXL

The AutoPipeline API

A new “combined pipeline” for the Kandinsky series

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

Asymmetric VQGAN

Improved support for loading Kohya-style LoRA checkpoints

T2I Adapter

All commits

Significant community contributions

huggingface/diffusers v0.19.0
v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN, T2I Adapter

on GitHub