SD3 CntrolNet
![image](https://private-user-images.githubusercontent.com/46553287/339954565-db384753-cfbb-488c-bc74-8280f9bee24e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg5Njg4ODMsIm5iZiI6MTcxODk2ODU4MywicGF0aCI6Ii80NjU1MzI4Ny8zMzk5NTQ1NjUtZGIzODQ3NTMtY2ZiYi00ODhjLWJjNzQtODI4MGY5YmVlMjRlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIxVDExMTYyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWVkMGM0ODY2NTMzNTgzNzYzMWVlZmE3MmZhODA2MTc1MWY0NzUzYTVmZDUyOWQ5M2IyNGQzZTM1ZDBlY2MwNjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.XY7ZV8RFBfU5J46TsW8NQ6wj_O81wO8FZEAdvbqqxOA)
import torch
from diffusers import StableDiffusion3ControlNetPipeline
from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel
from diffusers.utils import load_image
controlnet = SD3ControlNetModel.from_pretrained("InstantX/SD3-Controlnet-Canny", torch_dtype=torch.float16)
pipe = StableDiffusion3ControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.to("cuda")
control_image = load_image("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg")
prompt = "A girl holding a sign that says InstantX"
image = pipe(prompt, control_image=control_image, controlnet_conditioning_scale=0.7).images[0]
image.save("sd3.png")
π Refer to the official docs here to learn more about it.
Thanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!
Expanded single file support
We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_single_file(
"https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3_medium_incl_clips_t5xxlfp8.safetensors",
torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()
image = pipe("a picture of a cat holding a sign that says hello world").images[0]
image.save('sd3-single-file-t5-fp8.png')
Using Long Prompts with the T5 Text Encoder
We increased the default sequence length for the T5 Text Encoder from a maximum of 77
to 256
! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length
to a maximum of 512
. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. Itβs set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy."
image = pipe(
prompt=prompt,
negative_prompt="",
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
Before | max_sequence_length=256 | max_sequence_length=512 |
---|---|---|
![]() | ![]() | ![]() |
All commits
- fix warning log for Transformer SD3 by @sayakpaul in (#8496)
- Add SD3 AutoPipeline mappings by @Beinsezii in (#8489)
- Add Hunyuan AutoPipe mapping by @Beinsezii in (#8505)
- Expand Single File support in SD3 Pipeline by @DN6 in (#8517)
- [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed by @DN6 in (#8462)
- Fix sharding when no device_map is passed by @SunMarc in (#8531)
- [SD3 Inference] T5 Token limit by @asomoza in (#8506)
- Fix gradient checkpointing issue for Stable Diffusion 3 by @Carolinabanana in (#8542)
- Support SD3 ControlNet and Multi-ControlNet. by @wangqixun in (#8566)
- fix from_single_file for checkpoints with t5 by @yiyixuxu in (#8631)
- [SD3] Fix mis-matched shape when num_images_per_prompt > 1 using without T5 (text_encoder_3=None) by @Dalanke in #8558