v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more

Stable Diffusion 3.5 Large

Stability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):

A regular one
A timestep-distilled one enabling few-step inference

Make sure to fill up the form by going to the model page, and then run huggingface-cli login before running the code below.

# make sure to update diffusers
# pip install -U diffusers
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
	"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
).to("cuda")

image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=40,
    height=1024,
    width=1024,
    guidance_scale=4.5,
).images[0]

image.save("sd3_hello_world.png")

Follow the documentation to know more.

Cogview3-plus

We launched a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px.

from diffusers import CogView3PlusPipeline
import torch

pipe = CogView3PlusPipeline.from_pretrained("THUDM/CogView3-Plus-3B", torch_dtype=torch.float16).to("cuda")

# Enable it to reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."

image = pipe(
    prompt=prompt,
    guidance_scale=7.0,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview3.png")

Quantization

We have landed native quantization support in Diffusers, starting with bitsandbytes as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware.

The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:

pip install -Uq git+https://github.com/huggingface/transformers@main
pip install -Uq bitsandbytes
pip install -Uq diffusers

from diffusers import BitsAndBytesConfig, FluxTransformer2DModel
import torch

ckpt_id = "black-forest-labs/FLUX.1-dev"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = FluxTransformer2DModel.from_pretrained(
    ckpt_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)

Then, we use model_nf4 to instantiate the FluxPipeline:

from diffusers import FluxPipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    ckpt_id, 
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree.  As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"

image = pipeline(
    prompt=prompt,
    negative_prompt="",
    num_inference_steps=50,
    guidance_scale=4.5,
    max_sequence_length=512,
).images[0]
image.save("whimsical.png")

Follow the documentation here to know more. Additionally, check out this Colab Notebook that runs Flux.1 Dev in an end-to-end manner with NF4 quantization.

Training scripts

We have a fresh bucket of training scripts with this release:

Video model fine-tuning can be quite expensive. So, we have worked on a repository, cogvideox-factory, which provides memory-optimized scripts to fine-tune the Cog family of models.

Misc

We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
Loading of Xlabs Flux ControlNets is also now supported.

All commits

Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
set max_shard_size to None for pipeline save_pretrained by @a-r-r-o-w in #9447
adapt masked im2im pipeline for SDXL by @noskill in #7790
[Flux] add lora integration tests. by @sayakpaul in #9353
[training] CogVideoX Lora by @a-r-r-o-w in #9302
Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
[refactor] LoRA tests by @a-r-r-o-w in #9481
[CI] fix nightly model tests by @sayakpaul in #9483
[Cog] some minor fixes and nits by @sayakpaul in #9466
[Tests] Reduce the model size in the lumina test by @saqlain2204 in #8985
Fix the bug of sd3 controlnet training when using gradient checkpointing. by @pibbo88 in #9498
[Schedulers] Add exponential sigmas / exponential noise schedule by @hlky in #9499
Allow DDPMPipeline half precision by @sbinnee in #9222
Add Noise Schedule/Schedule Type to Schedulers Overview documentation by @hlky in #9504
fix bugs for sd3 controlnet training by @xduzhangjiayu in #9489
[Doc] Fix path and and also import imageio by @LukeLIN-web in #9506
[CI] allow faster downloads from the Hub in CI. by @sayakpaul in #9478
a few fix for SingleFile tests by @yiyixuxu in #9522
Add exponential sigmas to other schedulers and update docs by @hlky in #9518
[Community Pipeline] Batched implementation of Flux with CFG by @sayakpaul in #9513
Update community_projects.md by @lee101 in #9266
[docs] Model sharding by @stevhliu in #9521
update get_parameter_dtype by @yiyixuxu in #9526
[Doc] Improved level of clarity for latents_to_rgb. by @LagPixelLOL in #9529
[Schedulers] Add beta sigmas / beta noise schedule by @hlky in #9509
flux controlnet fix (control_modes batch & others) by @yiyixuxu in #9507
[Tests] Fix ChatGLMTokenizer by @asomoza in #9536
[bug] Precedence of operations in VAE should be slicing -> tiling by @a-r-r-o-w in #9342
[LoRA] make set_adapters() method more robust. by @sayakpaul in #9535
[examples] add train flux-controlnet scripts in example. by @PromeAIpro in #9324
[Tests] [LoRA] clean up the serialization stuff. by @sayakpaul in #9512
[Core] fix variant-identification. by @sayakpaul in #9253
[refactor] remove conv_cache from CogVideoX VAE by @a-r-r-o-w in #9524
[train_instruct_pix2pix.py]Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @AnandK27 in #9316
[chore] fix: retain memory utility. by @sayakpaul in #9543
[LoRA] support Kohya Flux LoRAs that have text encoders as well by @sayakpaul in #9542
Add beta sigmas to other schedulers and update docs by @hlky in #9538
Add PAG support to StableDiffusionControlNetPAGInpaintPipeline by @juancopi81 in #8875
Support bfloat16 for Upsample2D by @darhsu in #9480
fix cogvideox autoencoder decode by @Xiang-cd in #9569
[sd3] make sure height and size are divisible by 16 by @yiyixuxu in #9573
fix xlabs FLUX lora conversion typo by @Clement-Lelievre in #9581
[Chore] add a note on the versions in Flux LoRA integration tests by @sayakpaul in #9598
fix vae dtype when accelerate config using --mixed_precision="fp16" by @xduzhangjiayu in #9601
refac: docstrings in import_utils.py by @yijun-lee in #9583
Fix for use_safetensors parameters, allow use of parameter on loading submodels by @elismasilva in #9576)
Update distributed_inference.md to include transformer.device_map by @sayakpaul in #9553
fix: CogVideox train dataset _preprocess_data crop video by @glide-the in #9574
[LoRA] Handle DoRA better by @sayakpaul in #9547
Fixed noise_pred_text referenced before assignment. by @LagPixelLOL in #9537
Fix the bug that joint_attention_kwargs is not passed to the FLUX's transformer attention processors by @HorizonWind2004 in #9517
refac/pipeline_output by @yijun-lee in #9582
[LoRA] allow loras to be loaded with low_cpu_mem_usage. by @sayakpaul in #9510
add PAG support for SD Img2Img by @SahilCarterr in #9463
make controlnet support interrupt by @pureexe in #9620
[LoRA] fix dora test to catch the warning properly. by @sayakpaul in #9627
flux controlnet control_guidance_start and control_guidance_end implement by @ighoshsubho in #9571
fix IsADirectoryError when running the training code for sd3_dreambooth_lora_16gb.ipynb by @alaister123 in #9634
Add Differential Diffusion to Kolors by @saqlain2204 in #9423
FluxMultiControlNetModel by @hlky in #9647
[CI] replace ubuntu version to 22.04. by @sayakpaul in #9656
[docs] Fix xDiT doc image damage by @Eigensystem in #9655
[Tests] increase transformers version in test_low_cpu_mem_usage_with_loading by @sayakpaul in #9662
Flux - soft inpainting via differential diffusion by @ryanlyn in #9268
CogView3Plus DiT by @zRzRzRzRzRzRzR in #9570
Improve the performance and suitable for NPU computing by @leisuzz in #9642
[Community Pipeline] Add 🪆Matryoshka Diffusion Models by @tolgacangoz in #9157
Added Lora Support to SD3 Img2Img Pipeline by @SahilCarterr in #9659
Add pred_original_sample to if not return_dict path by @hlky in #9649
Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel by @hlky in #9652
Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel by @hlky in #9651
Refactor SchedulerOutput and add pred_original_sample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 by @hlky in #9650
Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral by @hlky in #9616
[Fix] when run load pretain with local_files_only, local variable 'cached_folder' referenced before assignment by @RobinXL in #9376
[Chore] fix import of EntryNotFoundError. by @sayakpaul in #9676
Dreambooth lora flux bug 3dtensor to 2dtensor by @0x-74 in #9653
refactor image_processor.py file by @charchit7 in #9608
[doc] Fix some docstrings in src/diffusers/training_utils.py by @mreraser in #9606
[docs] refactoring docstrings in community/hd_painter.py by @Jwaminju in #9593
[docs] refactoring docstrings in models/embeddings_flax.py by @Jwaminju in #9592
Fix some documentation in ./src/diffusers/models/adapter.py by @ahnjj in #9591
[training] CogVideoX-I2V LoRA by @a-r-r-o-w in #9482
[authored by @Anghellia) Add support of Xlabs Controlnets #9638 by @yiyixuxu in #9687
Docs: CogVideoX by @glide-the in #9578
Resolves [BUG] 'GatheredParameters' object is not callable by @charchit7 in #9614
[LoRA] log a warning when there are missing keys in the LoRA loading. by @sayakpaul in #9622
[SD3 dreambooth-lora training] small updates + bug fixes by @linoytsaban in #9682
[peft] simple update when unscale by @sweetcocoa in #9689
[pipeline] CogVideoX-Fun Control by @a-r-r-o-w in #9671
[core] improve VAE encode/decode framewise batching by @a-r-r-o-w in #9684
[tests] fix name and unskip CogI2V integration test by @a-r-r-o-w in #9683
[Flux] Add advanced training script + support textual inversion inference by @linoytsaban in #9434
[refactor] DiffusionPipeline.download by @a-r-r-o-w in #9557
[advanced flux lora script] minor updates to readme by @linoytsaban in #9705
Fix bug in Textual Inversion Unloading by @bonlime in #9304
Add prompt scheduling callback to community scripts by @hlky in #9718
[CI] pin max torch version to fix CI errors by @a-r-r-o-w in #9709
[Docker] pin torch versions in the dockerfiles. by @sayakpaul in #9721
make deps_table_update to fix CI tests by @a-r-r-o-w in #9720
[Quantization] Add quantization support for bitsandbytes by @sayakpaul in #9213
Fix typo in cogvideo pipeline by @lichenyu20 in #9722
[Docs] docs to xlabs controlnets. by @sayakpaul in #9688
[docs] add docstrings in pipline_stable_diffusion.py by @jeongiin in #9590
minor doc/test update by @yiyixuxu in #9734
[bugfix] reduce float value error when adding noise by @gameofdimension in #9004
fix singlestep dpm tests by @yiyixuxu in #9716
Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models by @tolgacangoz in #9723
Update sd3 controlnet example by @DavyMorgan in #9735
[Fix] Using sharded checkpoints with gated repositories by @asomoza in #9737
[bitsandbbytes] follow-ups by @sayakpaul in #9730
Fix typos by @DN6 in #9739
is_safetensors_compatible fix by @DN6 in #9741
Release: v0.31.0 by @sayakpaul (direct commit on v0.31.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ighoshsubho
- Feature flux controlnet img2img and inpaint pipeline (#9408)
- flux controlnet control_guidance_start and control_guidance_end implement (#9571)
@noskill
- adapt masked im2im pipeline for SDXL (#7790)
@saqlain2204
- [Tests] Reduce the model size in the lumina test (#8985)
- Add Differential Diffusion to Kolors (#9423)
@hlky
- [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
- Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
- Add exponential sigmas to other schedulers and update docs (#9518)
- [Schedulers] Add beta sigmas / beta noise schedule (#9509)
- Add beta sigmas to other schedulers and update docs (#9538)
- FluxMultiControlNetModel (#9647)
- Add pred_original_sample to if not return_dict path (#9649)
- Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel (#9652)
- Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel (#9651)
- Refactor SchedulerOutput and add pred_original_sample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 (#9650)
- Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral (#9616)
- Add prompt scheduling callback to community scripts (#9718)
@yiyixuxu
- a few fix for SingleFile tests (#9522)
- update get_parameter_dtype (#9526)
- flux controlnet fix (control_modes batch & others) (#9507)
- [sd3] make sure height and size are divisible by 16 (#9573)
- [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
- minor doc/test update (#9734)
- fix singlestep dpm tests (#9716)
@PromeAIpro
- [examples] add train flux-controlnet scripts in example. (#9324)
@juancopi81
- Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
@glide-the
- fix: CogVideox train dataset _preprocess_data crop video (#9574)
- Docs: CogVideoX (#9578)
@SahilCarterr
- add PAG support for SD Img2Img (#9463)
- Added Lora Support to SD3 Img2Img Pipeline (#9659)
@ryanlyn
- Flux - soft inpainting via differential diffusion (#9268)
@zRzRzRzRzRzRzR
- CogView3Plus DiT (#9570)
@tolgacangoz
- [Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157)
- Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models (#9723)
@linoytsaban
- [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
- [Flux] Add advanced training script + support textual inversion inference (#9434)
- [advanced flux lora script] minor updates to readme (#9705)

huggingface/diffusers v0.31.0 on GitHub