pypi diffusers 0.29.1
v0.29.1: SD3 ControlNet, Expanded SD3 `from_single_file` support, Using long Prompts with T5 Text Encoder & Bug fixes

latest releases: 0.31.0, 0.30.3, 0.30.2...
4 months ago

SD3 CntrolNet

image
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
from diffusers.utils import load_image

controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny", torch_dtype=torch.float16)

pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.to("cuda")
control_image = load_image("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg")
prompt = "A girl holding a sign that says InstantX"
image = pipe(prompt, control_image=control_image, controlnet_conditioning_scale=0.7).images[0]
image.save("sd3.png")

πŸ“œ Refer to the official docs here to learn more about it.

Thanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!

Expanded single file support

We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_single_file(
    "https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

image = pipe("a picture of a cat holding a sign that says hello world").images[0]
image.save('sd3-single-file-t5-fp8.png')

Using Long Prompts with the T5 Text Encoder

We increased the default sequence length for the T5 Text Encoder from a maximum of 77 to 256! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length to a maximum of 512. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. It’s set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy."

image = pipe(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=28,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
Before max_sequence_length=256 max_sequence_length=512
20240612204503_2888268196 20240612204440_2888268196 20240613195139_569754043

All commits

  • fix warning log for Transformer SD3 by @sayakpaul in (#8496)
  • Add SD3 AutoPipeline mappings by @Beinsezii in (#8489)
  • Add Hunyuan AutoPipe mapping by @Beinsezii in (#8505)
  • Expand Single File support in SD3 Pipeline by @DN6 in (#8517)
  • [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed by @DN6 in (#8462)
  • Fix sharding when no device_map is passed by @SunMarc in (#8531)
  • [SD3 Inference] T5 Token limit by @asomoza in (#8506)
  • Fix gradient checkpointing issue for Stable Diffusion 3 by @Carolinabanana in (#8542)
  • Support SD3 ControlNet and Multi-ControlNet. by @wangqixun in (#8566)
  • fix from_single_file for checkpoints with t5 by @yiyixuxu in (#8631)
  • [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558

Don't miss a new diffusers release

NewReleases is sending notifications on new releases.