SD3 CntrolNet
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
from diffusers.utils import load_image
controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny", torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.to("cuda")
control_image = load_image("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg")
prompt = "A girl holding a sign that says InstantX"
image = pipe(prompt, control_image=control_image, controlnet_conditioning_scale=0.7).images[0]
image.save("sd3.png")
π Refer to the official docs here to learn more about it.
Thanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!
Expanded single file support
We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
image = pipe("a picture of a cat holding a sign that says hello world").images[0]
image.save('sd3-single-file-t5-fp8.png')
Using Long Prompts with the T5 Text Encoder
We increased the default sequence length for the T5 Text Encoder from a maximum of 77
to 256
! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length
to a maximum of 512
. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. Itβs set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy."
image = pipe(
prompt=prompt,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
Before | max_sequence_length=256 | max_sequence_length=512 |
---|---|---|
All commits
- fix warning log for Transformer SD3 by @sayakpaul in (#8496)
- Add SD3 AutoPipeline mappings by @Beinsezii in (#8489)
- Add Hunyuan AutoPipe mapping by @Beinsezii in (#8505)
- Expand Single File support in SD3 Pipeline by @DN6 in (#8517)
- [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed by @DN6 in (#8462)
- Fix sharding when no device_map is passed by @SunMarc in (#8531)
- [SD3 Inference] T5 Token limit by @asomoza in (#8506)
- Fix gradient checkpointing issue for Stable Diffusion 3 by @Carolinabanana in (#8542)
- Support SD3 ControlNet and Multi-ControlNet. by @wangqixun in (#8566)
- fix from_single_file for checkpoints with t5 by @yiyixuxu in (#8631)
- [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558