New Pipelines for Video Generation
Wan 2.1
Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.
Wan-AI/Wan2.1-T2V-1.3B-Diffusers
Wan-AI/Wan2.1-T2V-14B-Diffusers
Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
Check out the docs here to learn more.
LTX Video 0.9.5
LTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward).
To support these additional conditioning inputs, we’ve introduced the LTXConditionPipeline
and LTXVideoCondition
object.
To learn more about the usage, check out the docs here.
Hunyuan Image to Video
Hunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder. The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.
To learn more, check out the docs here.
Others
- EasyAnimateV5 (thanks to @bubbliiiing for contributing this in this PR)
- ConsisID (thanks to @SHYuanBest for contributing this in this PR)
New Pipelines for Image Generation
Sana-Sprint
SANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.
Shoutout to @lawrence-cj for their help and guidance on this PR.
Check out the pipeline docs of SANA-Sprint to learn more.
Lumina2
Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.
Check out the docs to learn more. Thanks to @zhuole1025 for contributing this through this PR.
One can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out the guide for more details.
Omnigen
OmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.
Check out the docs to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in this PR.
Others
- CogView4 (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in this PR)
New Memory Optimizations
Layerwise Casting
PyTorch supports torch.float8_e4m3fn
and torch.float8_e5m2
as weight storage dtypes
, but they can’t be used for computation on many devices due to unimplemented kernel support.
However, you can still use these dtypes
to store model weights in FP8 precision and upcast them to a widely supported dtype such as torch.float16
or torch.bfloat16
on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.
Code
import torch
from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel
from diffusers.utils import export_to_video
model_id = "THUDM/CogVideoX-5b"
# Load the model in bfloat16 and enable layerwise casting
transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16)
# Load the pipeline
pipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16)
pipe.to("cuda")
prompt = (
"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
"The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
"atmosphere of this unique musical performance."
)
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
export_to_video(video, "output.mp4", fps=8)
Group Offloading
Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either torch.nn.ModuleList
or torch.nn.Sequential
), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.
On CUDA devices, we also have the option to enable using layer prefetching with CUDA Streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.
One thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set use_stream=True
. You can reduce CPU RAM usage by setting low_cpu_mem_usage=True
. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.
You can also use record_stream=True
when using use_stream=True
to obtain more speedups at the expense of slightly increased memory usage.
Code
import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
# Load the pipeline
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
# We can utilize the enable_group_offload method for Diffusers model implementations
pipe.transformer.enable_group_offload(
onload_device=onload_device,
offload_device=offload_device,
offload_type="leaf_level",
use_stream=True
)
prompt = (
"A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
"The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
"pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
"casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
"The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
"atmosphere of this unique musical performance."
)
video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0]
# This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline.
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
export_to_video(video, "output.mp4", fps=8)
Group offloading can also be applied to non-Diffusers models such as text encoders from the transformers
library.
Code
import torch
from diffusers import CogVideoXPipeline
from diffusers.hooks import apply_group_offloading
from diffusers.utils import export_to_video
# Load the pipeline
onload_device = torch.device("cuda")
offload_device = torch.device("cpu")
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
# For any other model implementations, the apply_group_offloading function can be used
apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2)
Remote Components
Remote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:
Model | Endpoint | Model |
---|---|---|
Stable Diffusion v1 | https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud | stabilityai/sd-vae-ft-mse |
Stable Diffusion XL | https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud | madebyollin/sdxl-vae-fp16-fix |
Flux | https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud | black-forest-labs/FLUX.1-schnell |
HunyuanVideo | https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud | hunyuanvideo-community/HunyuanVideo |
This is an example of using remote decoding with the Hunyuan Video pipeline:
Code
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
model_id = "hunyuanvideo-community/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
model_id, subfolder="transformer", torch_dtype=torch.bfloat16
)
pipe = HunyuanVideoPipeline.from_pretrained(
model_id, transformer=transformer, vae=None, torch_dtype=torch.float16
).to("cuda")
latent = pipe(
prompt="A cat walks on the grass, realistic",
height=320,
width=512,
num_frames=61,
num_inference_steps=30,
output_type="latent",
).frames
video = remote_decode(
endpoint="https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/",
tensor=latent,
output_type="mp4",
)
if isinstance(video, bytes):
with open("video.mp4", "wb") as f:
f.write(video)
Check out the docs to know more.
Introducing Cached Inference for DiTs
Cached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.
Check out the docs to learn more about the available caching methods.
Pyramind Attention Broadcast
Code
import torch
from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
pipe.to("cuda")
config = PyramidAttentionBroadcastConfig(
spatial_attention_block_skip_range=2,
spatial_attention_timestep_skip_range=(100, 800),
current_timestep_callback=lambda: pipe.current_timestep,
)
pipe.transformer.enable_cache(config)
FasterCache
Code
import torch
from diffusers import CogVideoXPipeline, FasterCacheConfig
pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16)
pipe.to("cuda")
config = FasterCacheConfig(
spatial_attention_block_skip_range=2,
spatial_attention_timestep_skip_range=(-1, 901),
unconditional_batch_skip_range=2,
attention_weight_callback=lambda _: 0.5,
is_guidance_distilled=True,
)
pipe.transformer.enable_cache(config)
Quantization
Quanto Backend
Diffusers now has support for the Quanto quantization backend, which provides float8
, int8
, int4
and int2
quantization dtypes.
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
Quanto int8
models are also compatible with torch.compile
:
Code
import torch
from diffusers import FluxTransformer2DModel, QuantoConfig
model_id = "black-forest-labs/FLUX.1-dev"
quantization_config = QuantoConfig(weights_dtype="float8")
transformer = FluxTransformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=quantization_config,
torch_dtype=torch.bfloat16,
)
transformer.compile()
Improved loading for uintx
TorchAO checkpoints with torch>=2.6
TorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the uintx
format, such as uint4wo
this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.
Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.
- state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu")
- with init_empty_weights():
- transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json")
- transformer.load_state_dict(state_dict, strict=True, assign=True)
+ transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")
LoRAs
We have shipped a couple of improvements on the LoRA front in this release.
- Improved coverage for loading non-diffusers LoRA checkpoints
- Loading LoRAs into quantized model checkpoints
Also, take note of the breaking change introduced in this PR 🚨 We suggest you upgrade your peft
installation to the latest version - pip install -U peft
especially when dealing with Flux LoRAs.
dtype
Maps for Pipelines
Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:
from diffusers import HunyuanVideoPipeline
import torch
pipe = HunyuanVideoPipeline.from_pretrained(
"hunyuanvideo-community/HunyuanVideo",
torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
)
print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
AutoModel
This release includes an AutoModel object similar to the one found in transformers
that automatically fetches the appropriate model class for the provided repo.
from diffusers import AutoModel
unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")
All commits
- [Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
- IP-Adapter for
StableDiffusion3Img2ImgPipeline
by @guiyrt in #10589 - [DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 by @chenjy2003 in #10595
- Move buffers to device by @hlky in #10523
- [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint by @guiyrt in #10597
- Scheduling fixes on MPS by @hlky in #10549
- [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
- NPU adaption for RMSNorm by @leisuzz in #10534
- implementing flux on TPUs with ptxla by @entrpn in #10515
- [core] ConsisID by @SHYuanBest in #10140
- [training] set rest of the blocks with
requires_grad
False. by @sayakpaul in #10607 - chore: remove redundant words by @sunxunle in #10609
- bugfix for npu not support float64 by @baymax591 in #10123
- [chore] change licensing to 2025 from 2024. by @sayakpaul in #10615
- Enable dreambooth lora finetune example on other devices by @jiqing-feng in #10602
- Remove the FP32 Wrapper when evaluating by @lmxyy in #10617
- [tests] make tests device-agnostic (part 3) by @faaany in #10437
- fix offload gpu tests etc by @yiyixuxu in #10366
- Remove cache migration script by @Wauplin in #10619
- [core] Layerwise Upcasting by @a-r-r-o-w in #10347
- Improve TorchAO error message by @a-r-r-o-w in #10627
- [CI] Update HF_TOKEN in all workflows by @DN6 in #10613
- add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py by @kahmed10 in #10624
- [Tests] modify the test slices for the failing flax test by @sayakpaul in #10630
- [docs] fix image path in para attention docs by @sayakpaul in #10632
- [docs] uv installation by @stevhliu in #10622
- width and height are mixed-up by @raulc0399 in #10629
- Add IP-Adapter example to Flux docs by @hlky in #10633
- removing redundant requires_grad = False by @YanivDorGalron in #10628
- [chore] add a script to extract loras from full fine-tuned models by @sayakpaul in #10631
- Add pipeline_stable_diffusion_xl_attentive_eraser by @Anonym0u3 in #10579
- NPU Adaption for Sanna by @leisuzz in #10409
- Add sigmoid scheduler in
scheduling_ddpm.py
docs by @JacobHelwig in #10648 - create a script to train autoencoderkl by @lavinal712 in #10605
- Add community pipeline for semantic guidance for FLUX by @Marlon154 in #10610
- ControlNet Union controlnet_conditioning_scale for multiple control inputs by @hlky in #10666
- [training] Convert to ImageFolder script by @hlky in #10664
- Add provider_options to OnnxRuntimeModel by @hlky in #10661
- fix check_inputs func in LuminaText2ImgPipeline by @victolee0 in #10651
- SDXL ControlNet Union pipelines, make control_image argument immutible by @Teriks in #10663
- Revert RePaint scheduler 'fix' by @GiusCat in #10644
- [core] Pyramid Attention Broadcast by @a-r-r-o-w in #9562
- [fix] refer use_framewise_encoding on AutoencoderKLHunyuanVideo._encode by @hanchchch in #10600
- Refactor gradient checkpointing by @a-r-r-o-w in #10611
- [Tests] conditionally check
fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory
by @sayakpaul in #10669 - Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode by @dimitribarbot in #10670
- [tests] update llamatokenizer in hunyuanvideo tests by @sayakpaul in #10681
- support StableDiffusionAdapterPipeline.from_single_file by @Teriks in #10552
- fix(hunyuan-video): typo in height and width input check by @badayvedat in #10684
- [FIX] check_inputs function in Auraflow Pipeline by @SahilCarterr in #10678
- Fix enable memory efficient attention on ROCm by @tenpercent in #10564
- Fix inconsistent random transform in instruct pix2pix by @Luvata in #10698
- feat(training-utils): support device and dtype params in compute_density_for_timestep_sampling by @badayvedat in #10699
- Fixed grammar in "write_own_pipeline" readme by @N0-Flux-given in #10706
- Fix Documentation about Image-to-Image Pipeline by @ParagEkbote in #10704
- [bitsandbytes] Simplify bnb int8 dequant by @sayakpaul in #10401
- Fix train_text_to_image.py --help by @nkthiebaut in #10711
- Notebooks for Community Scripts-6 by @ParagEkbote in #10713
- [Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference by @SahilCarterr in #10714
- add provider_options in from_pretrained by @xieofxie in #10719
- [Community] Enhanced
Model Search
by @suzukimain in #10417 - [bugfix] NPU Adaption for Sana by @leisuzz in #10724
- Quantized Flux with IP-Adapter by @hlky in #10728
- EDMEulerScheduler accept sigmas, add final_sigmas_type by @hlky in #10734
- [LoRA] fix peft state dict parsing by @sayakpaul in #10532
- Add
Self
type hint toModelMixin
'sfrom_pretrained
by @hlky in #10742 - [Tests] Test layerwise casting with training by @sayakpaul in #10765
- speedup hunyuan encoder causal mask generation by @dabeschte in #10764
- [CI] Fix Truffle Hog failure by @DN6 in #10769
- Add OmniGen by @staoxiao in #10148
- feat: new community mixture_tiling_sdxl pipeline for SDXL by @elismasilva in #10759
- Add support for lumina2 by @zhuole1025 in #10642
- Refactor OmniGen by @a-r-r-o-w in #10771
- Faster set_adapters by @Luvata in #10777
- [Single File] Add Single File support for Lumina Image 2.0 Transformer by @DN6 in #10781
- Fix
use_lu_lambdas
anduse_karras_sigmas
withbeta_schedule=squaredcos_cap_v2
inDPMSolverMultistepScheduler
by @hlky in #10740 MultiControlNetUnionModel
on SDXL by @guiyrt in #10747- fix: [Community pipeline] Fix flattened elements on image by @elismasilva in #10774
- make tensors contiguous before passing to safetensors by @faaany in #10761
- Disable PEFT input autocast when using fp8 layerwise casting by @a-r-r-o-w in #10685
- Update FlowMatch docstrings to mention correct output classes by @a-r-r-o-w in #10788
- Refactor CogVideoX transformer forward by @a-r-r-o-w in #10789
- Module Group Offloading by @a-r-r-o-w in #10503
- Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 by @puhuk in #10792
- [FIX] check_inputs function in lumina2 by @SahilCarterr in #10784
- follow-up refactor on lumina2 by @yiyixuxu in #10776
- CogView4 (supports different length c and uc) by @zRzRzRzRzRzRzR in #10649
- typo fix by @YanivDorGalron in #10802
- Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines by @ParagEkbote in #10746
- [chore] update notes generation spaces by @sayakpaul in #10592
- [LoRA] improve lora support for flux. by @sayakpaul in #10810
- Fix max_shift value in flux and related functions to 1.15 (issue #10675) by @puhuk in #10807
- [docs] add missing entries to the lora docs. by @sayakpaul in #10819
- DiffusionPipeline mixin
to
+FromOriginalModelMixin/FromSingleFileMixinfrom_single_file
type hint by @hlky in #10811 - [LoRA] make
set_adapters()
robust on silent failures. by @sayakpaul in #9618 - [FEAT] Model loading refactor by @SunMarc in #10604
- [misc] feat: introduce a style bot. by @sayakpaul in #10274
- Remove print statements by @a-r-r-o-w in #10836
- [tests] use proper gemma class and config in lumina2 tests. by @sayakpaul in #10828
- [LoRA] add LoRA support to Lumina2 and fine-tuning script by @sayakpaul in #10818
- [Utils] add utilities for checking if certain utilities are properly documented by @sayakpaul in #7763
- Add missing
isinstance
for arg checks in GGUFParameter by @AstraliteHeart in #10834 - [tests] test
encode_prompt()
in isolation by @sayakpaul in #10438 - store activation cls instead of function by @SunMarc in #10832
- fix: support transformer models'
generation_config
in pipeline by @JeffersonQin in #10779 - Notebooks for Community Scripts-7 by @ParagEkbote in #10846
- [CI] install accelerate transformers from
main
by @sayakpaul in #10289 - [CI] run fast gpu tests conditionally on pull requests. by @sayakpaul in #10310
- SD3 IP-Adapter runtime checkpoint conversion by @guiyrt in #10718
- Some consistency-related fixes for HunyuanVideo by @a-r-r-o-w in #10835
- SkyReels Hunyuan T2V & I2V by @a-r-r-o-w in #10837
- fix: run tests from a pr workflow. by @sayakpaul in #9696
- [chore] template for remote vae. by @sayakpaul in #10849
- fix remote vae template by @sayakpaul in #10852
- [CI] Fix incorrectly named test module for Hunyuan DiT by @DN6 in #10854
- [CI] Update always test Pipelines list in Pipeline fetcher by @DN6 in #10856
device_map
inload_model_dict_into_meta
by @hlky in #10851- [Fix] Docs overview.md by @SahilCarterr in #10858
- remove format check for safetensors file by @SunMarc in #10864
- [docs] LoRA support by @stevhliu in #10844
- Comprehensive type checking for
from_pretrained
kwargs by @guiyrt in #10758 - Fix
torch_dtype
in Kolors text encoder withtransformers
v4.49 by @hlky in #10816 - [LoRA] restrict certain keys to be checked for peft config update. by @sayakpaul in #10808
- Add SD3 ControlNet to AutoPipeline by @hlky in #10888
- [docs] Update prompt weighting docs by @stevhliu in #10843
- [docs] Flux group offload by @stevhliu in #10847
- [Fix] fp16 unscaling in train_dreambooth_lora_sdxl by @SahilCarterr in #10889
- [docs] Add CogVideoX Schedulers by @a-r-r-o-w in #10885
- [chore] correct qk norm list. by @sayakpaul in #10876
- [Docs] Fix toctree sorting by @DN6 in #10894
- [refactor] SD3 docs & remove additional code by @a-r-r-o-w in #10882
- [refactor] Remove additional Flux code by @a-r-r-o-w in #10881
- [CI] Improvements to conditional GPU PR tests by @DN6 in #10859
- Multi IP-Adapter for Flux pipelines by @guiyrt in #10867
- Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". by @CyberVy in #10880
- Security fix by @ydshieh in #10905
- Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation by @toshas in #10884
- [Tests] fix: lumina2 lora fuse_nan test by @sayakpaul in #10911
- Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. by @CyberVy in #10907
- [CI] Fix Fast GPU tests on PR by @DN6 in #10912
- [CI] Fix for failing IP Adapter test in Fast GPU PR tests by @DN6 in #10915
- Experimental per control type scale for ControlNet Union by @hlky in #10723
- [style bot] improve security for the stylebot. by @sayakpaul in #10908
- [CI] Update Stylebot Permissions by @DN6 in #10931
- [Alibaba Wan Team] continue on #10921 Wan2.1 by @yiyixuxu in #10922
- Support IPAdapter for more Flux pipelines by @hlky in #10708
- Add
remote_decode
toremote_utils
by @hlky in #10898 - Update VAE Decode endpoints by @hlky in #10939
- [chore] fix-copies to flux pipelines by @sayakpaul in #10941
- [Tests] Remove more encode prompts tests by @sayakpaul in #10942
- Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model by @bubbliiiing in #10626
- Fix SD2.X clip single file load projection_dim by @Teriks in #10770
- add from_single_file to animatediff by @ in #10924
- Add Example of IPAdapterScaleCutoffCallback to Docs by @ParagEkbote in #10934
- Update pipeline_cogview4.py by @zRzRzRzRzRzRzR in #10944
- Fix redundant prev_output_channel assignment in UNet2DModel by @ahmedbelgacem in #10945
- Improve load_ip_adapter RAM Usage by @CyberVy in #10948
- [tests] make tests device-agnostic (part 4) by @faaany in #10508
- Update evaluation.md by @sayakpaul in #10938
- [LoRA] feat: support non-diffusers lumina2 LoRAs. by @sayakpaul in #10909
- [Quantization] support pass MappingType for TorchAoConfig by @a120092009 in #10927
- Fix the missing parentheses when calling is_torchao_available in quantization_config.py. by @CyberVy in #10961
- [LoRA] Support Wan by @a-r-r-o-w in #10943
- Fix incorrect seed initialization when args.seed is 0 by @azolotenkov in #10964
- feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL by @elismasilva in #10951
- [Docs] CogView4 comment fix by @zRzRzRzRzRzRzR in #10957
- update check_input for cogview4 by @yiyixuxu in #10966
- Add VAE Decode endpoint slow test by @hlky in #10946
- [flux lora training] fix t5 training bug by @linoytsaban in #10845
- use style bot GH Action from
huggingface_hub
by @hanouticelina in #10970 - [train_dreambooth_lora.py] Fix the LR Schedulers when
num_train_epochs
is passed in a distributed training env by @flyxiv in #10973 - [tests] fix tests for save load components by @sayakpaul in #10977
- Fix loading OneTrainer Flux LoRA by @hlky in #10978
- fix default values of Flux guidance_scale in docstrings by @catwell in #10982
- [CI] remove synchornized. by @sayakpaul in #10980
- Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill by @dependabot[bot] in #10984
- Fix Flux Controlnet Pipeline _callback_tensor_inputs Missing Some Elements by @CyberVy in #10974
- [Single File] Add user agent to SF download requests. by @DN6 in #10979
- Add CogVideoX DDIM Inversion to Community Pipelines by @LittleNyima in #10956
- fix wan i2v pipeline bugs by @yupeng1111 in #10975
- Hunyuan I2V by @a-r-r-o-w in #10983
- Fix Graph Breaks When Compiling CogView4 by @chengzeyi in #10959
- Wan VAE move scaling to pipeline by @hlky in #10998
- [LoRA] remove full key prefix from peft. by @sayakpaul in #11004
- [Single File] Add single file support for Wan T2V/I2V by @DN6 in #10991
- Add STG to community pipelines by @kinam0252 in #10960
- [LoRA] Improve copied from comments in the LoRA loader classes by @sayakpaul in #10995
- Fix for fetching variants only by @DN6 in #10646
- [Quantization] Add Quanto backend by @DN6 in #10756
- [Single File] Add single file loading for SANA Transformer by @ishan-modi in #10947
- [LoRA] Improve warning messages when LoRA loading becomes a no-op by @sayakpaul in #10187
- [LoRA] CogView4 by @a-r-r-o-w in #10981
- [Tests] improve quantization tests by additionally measuring the inference memory savings by @sayakpaul in #11021
- [
Research Project
] Add AnyText: Multilingual Visual Text Generation And Editing by @tolgacangoz in #8998 - [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 by @DN6 in #11018
- fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings by @elismasilva in #11012
- [LoRA] support wan i2v loras from the world. by @sayakpaul in #11025
- Fix SD3 IPAdapter feature extractor by @hlky in #11027
- chore: fix help messages in advanced diffusion examples by @wonderfan in #10923
- Fix missing **kwargs in lora_pipeline.py by @CyberVy in #11011
- Fix for multi-GPU WAN inference by @AmericanPresidentJimmyCarter in #10997
- [Refactor] Clean up import utils boilerplate by @DN6 in #11026
- Use
output_size
inrepeat_interleave
by @hlky in #11030 - [hybrid inference 🍯🐝] Add VAE encode by @hlky in #11017
- Wan Pipeline scaling fix, type hint warning, multi generator fix by @hlky in #11007
- [LoRA] change to warning from info when notifying the users about a LoRA no-op by @sayakpaul in #11044
- Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline by @hlky in #10827
- making
formatted_images
initialization compact by @YanivDorGalron in #10801 - Fix aclnnRepeatInterleaveIntWithDim error on NPU for get_1d_rotary_pos_embed by @ZhengKai91 in #10820
- [Tests] restrict memory tests for quanto for certain schemes. by @sayakpaul in #11052
- [LoRA] feat: support non-diffusers wan t2v loras. by @sayakpaul in #11059
- [examples/controlnet/train_controlnet_sd3.py] Fixes #11050 - Cast prompt_embeds and pooled_prompt_embeds to weight_dtype to prevent dtype mismatch by @andjoer in #11051
- reverts accidental change that removes attn_mask in attn. Improves fl… by @entrpn in #11065
- Fix deterministic issue when getting pipeline dtype and device by @dimitribarbot in #10696
- [Tests] add requires peft decorator. by @sayakpaul in #11037
- CogView4 Control Block by @zRzRzRzRzRzRzR in #10809
- [CI] pin transformers version for benchmarking. by @sayakpaul in #11067
- Fix Wan I2V Quality by @chengzeyi in #11087
- LTX 0.9.5 by @a-r-r-o-w in #10968
- make PR GPU tests conditioned on styling. by @sayakpaul in #11099
- Group offloading improvements by @a-r-r-o-w in #11094
- Fix pipeline_flux_controlnet.py by @co63oc in #11095
- update readme instructions. by @entrpn in #11096
- Resolve stride mismatch in UNet's ResNet to support Torch DDP by @jinc7461 in #11098
- Fix Group offloading behaviour when using streams by @a-r-r-o-w in #11097
- Quality options in
export_to_video
by @hlky in #11090 - [CI] uninstall deps properly from pr gpu tests. by @sayakpaul in #11102
- [BUG] Fix Autoencoderkl train script by @lavinal712 in #11113
- [Wan LoRAs] make T2V LoRAs compatible with Wan I2V by @linoytsaban in #11107
- [tests] enable bnb tests on xpu by @faaany in #11001
- [fix bug] PixArt inference_steps=1 by @lawrence-cj in #11079
- Flux with Remote Encode by @hlky in #11091
- [tests] make cuda only tests device-agnostic by @faaany in #11058
- Provide option to reduce CPU RAM usage in Group Offload by @DN6 in #11106
- remove F.rms_norm for now by @yiyixuxu in #11126
- Notebooks for Community Scripts-8 by @ParagEkbote in #11128
- fix _callback_tensor_inputs of sd controlnet inpaint pipeline missing some elements by @CyberVy in #11073
- [core] FasterCache by @a-r-r-o-w in #10163
- add sana-sprint by @yiyixuxu in #11074
- Don't override
torch_dtype
and don't use whenquantization_config
is set by @hlky in #11039 - Update README and example code for AnyText usage by @tolgacangoz in #11028
- Modify the implementation of retrieve_timesteps in CogView4-Control. by @zRzRzRzRzRzRzR in #11125
- [fix SANA-Sprint] by @lawrence-cj in #11142
- New HunyuanVideo-I2V by @a-r-r-o-w in #11066
- [doc] Fix Korean Controlnet Train doc by @flyxiv in #11141
- Improve information about group offloading and layerwise casting by @a-r-r-o-w in #11101
- add a timestep scale for sana-sprint teacher model by @lawrence-cj in #11150
- [Quantization] dtype fix for GGUF + fix BnB tests by @DN6 in #11159
- Set self._hf_peft_config_loaded to True when LoRA is loaded using
load_lora_adapter
in PeftAdapterMixin class by @kentdan3msu in #11155 - WanI2V encode_image by @hlky in #11164
- [Docs] Update Wan Docs with memory optimizations by @DN6 in #11089
- Fix LatteTransformer3DModel dtype mismatch with enable_temporal_attentions by @hlky in #11139
- Raise warning and round down if Wan num_frames is not 4k + 1 by @a-r-r-o-w in #11167
- [Docs] Fix environment variables in
installation.md
by @remarkablemark in #11179 - Add
latents_mean
andlatents_std
toSDXLLongPromptWeightingPipeline
by @hlky in #11034 - Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set by @kakukakujirori in #10918
- [tests] no hard-coded cuda by @faaany in #11186
- [WIP] Add Wan Video2Video by @DN6 in #11053
- map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU by @yao-matrix in #11191
- fix autocast by @jiqing-feng in #11190
- fix: for checking mandatory and optional pipeline components by @elismasilva in #11189
- remove unnecessary call to
F.pad
by @bm-synth in #10620 - allow models to run with a user-provided dtype map instead of a single dtype by @hlky in #10301
- [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU by @faaany in #11197
- Revert
save_model
in ModelMixin save_pretrained and use safe_serialization=False in test by @hlky in #11196 - [docs]
torch_dtype
map by @hlky in #11194 - Fix enable_sequential_cpu_offload in CogView4Pipeline by @hlky in #11195
- SchedulerMixin from_pretrained and ConfigMixin Self type annotation by @hlky in #11192
- Update import_utils.py by @Lakshaysharma048 in #10329
- Add CacheMixin to Wan and LTX Transformers by @DN6 in #11187
- feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline by @elismasilva in #11188
- [Model Card] standardize advanced diffusion training sdxl lora by @chiral-carbon in #7615
- Change KolorsPipeline LoRA Loader to StableDiffusion by @BasileLewan in #11198
- Update Style Bot workflow by @hanouticelina in #11202
- Fixed requests.get function call by adding timeout parameter. by @kghamilton89 in #11156
- Fix Single File loading for LTX VAE by @DN6 in #11200
- [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) by @Suprhimp in #10603
- [LTX0.9.5] Refactor
LTXConditionPipeline
for text-only conditioning by @tolgacangoz in #11174 - Add Wan with STG as a community pipeline by @Ednaordinary in #11184
- Add missing MochiEncoder3D.gradient_checkpointing attribute by @mjkvaak-amd in #11146
- enable 1 case on XPU by @yao-matrix in #11219
- ensure dtype match between diffused latents and vae weights by @heyalexchoi in #8391
- [docs] MPS update by @stevhliu in #11212
- Add support to pass image embeddings to the WAN I2V pipeline. by @goiri in #11175
- [train_controlnet.py] Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @Bhavay-2001 in #8461
- [Training] Better image interpolation in training scripts by @asomoza in #11206
- [LoRA] Implement hot-swapping of LoRA by @BenjaminBossan in #9453
- introduce compute arch specific expectations and fix test_sd3_img2img_inference failure by @yao-matrix in #11227
- [Flux LoRA] fix issues in flux lora scripts by @linoytsaban in #11111
- Flux quantized with lora by @hlky in #10990
- [feat] implement
record_stream
when using CUDA streams during group offloading by @sayakpaul in #11081 - [bistandbytes] improve replacement warnings for bnb by @sayakpaul in #11132
- minor update to sana sprint docs. by @sayakpaul in #11236
- [docs] minor updates to dtype map docs. by @sayakpaul in #11237
- [LoRA] support more comyui loras for Flux 🚨 by @sayakpaul in #10985
- fix: SD3 ControlNet validation so that it runs on a A100. by @sayakpaul in #11238
- AudioLDM2 Fixes by @hlky in #11244
- AutoModel by @hlky in #11115
- fix FluxReduxSlowTests::test_flux_redux_inference case failure on XPU by @yao-matrix in #11245
- [docs] AutoModel by @hlky in #11250
- Update Ruff to latest Version by @DN6 in #10919
- fix flux controlnet bug by @free001style in #11152
- fix timeout constant by @sayakpaul in #11252
- fix consisid imports by @sayakpaul in #11254
- Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @guiyrt
- IP-Adapter for
StableDiffusion3Img2ImgPipeline
(#10589) - [Docs] Update SD3 ip_adapter model_id to diffusers checkpoint (#10597)
MultiControlNetUnionModel
on SDXL (#10747)- SD3 IP-Adapter runtime checkpoint conversion (#10718)
- Comprehensive type checking for
from_pretrained
kwargs (#10758) - Multi IP-Adapter for Flux pipelines (#10867)
- IP-Adapter for
- @chengzeyi
- @entrpn
- @SHYuanBest
- [core] ConsisID (#10140)
- @faaany
- [tests] make tests device-agnostic (part 3) (#10437)
- make tensors contiguous before passing to safetensors (#10761)
- [tests] make tests device-agnostic (part 4) (#10508)
- [tests] enable bnb tests on xpu (#11001)
- [tests] make cuda only tests device-agnostic (#11058)
- [tests] no hard-coded cuda (#11186)
- [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
- @yiyixuxu
- @DN6
- [CI] Update HF_TOKEN in all workflows (#10613)
- [CI] Fix Truffle Hog failure (#10769)
- [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)
- [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)
- [CI] Update always test Pipelines list in Pipeline fetcher (#10856)
- [Docs] Fix toctree sorting (#10894)
- [CI] Improvements to conditional GPU PR tests (#10859)
- [CI] Fix Fast GPU tests on PR (#10912)
- [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)
- [CI] Update Stylebot Permissions (#10931)
- [Single File] Add user agent to SF download requests. (#10979)
- [Single File] Add single file support for Wan T2V/I2V (#10991)
- Fix for fetching variants only (#10646)
- [Quantization] Add Quanto backend (#10756)
- [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
- [Refactor] Clean up import utils boilerplate (#11026)
- Provide option to reduce CPU RAM usage in Group Offload (#11106)
- [Quantization] dtype fix for GGUF + fix BnB tests (#11159)
- [Docs] Update Wan Docs with memory optimizations (#11089)
- [WIP] Add Wan Video2Video (#11053)
- Add CacheMixin to Wan and LTX Transformers (#11187)
- Fix Single File loading for LTX VAE (#11200)
- Update Ruff to latest Version (#10919)
- @Anonym0u3
- Add pipeline_stable_diffusion_xl_attentive_eraser (#10579)
- @lavinal712
- @Marlon154
- Add community pipeline for semantic guidance for FLUX (#10610)
- @ParagEkbote
- Fix Documentation about Image-to-Image Pipeline (#10704)
- Notebooks for Community Scripts-6 (#10713)
- Extend Support for callback_on_step_end for AuraFlow and LuminaText2Img Pipelines (#10746)
- Notebooks for Community Scripts-7 (#10846)
- Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)
- Notebooks for Community Scripts-8 (#11128)
- @suzukimain
- [Community] Enhanced
Model Search
(#10417)
- [Community] Enhanced
- @staoxiao
- Add OmniGen (#10148)
- @elismasilva
- feat: new community mixture_tiling_sdxl pipeline for SDXL (#10759)
- fix: [Community pipeline] Fix flattened elements on image (#10774)
- feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)
- fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
- fix: for checking mandatory and optional pipeline components (#11189)
- feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
- @zhuole1025
- Add support for lumina2 (#10642)
- @zRzRzRzRzRzRzR
- @toshas
- Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)
- @bubbliiiing
- Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)
- @LittleNyima
- Add CogVideoX DDIM Inversion to Community Pipelines (#10956)
- @kinam0252
- Add STG to community pipelines (#10960)
- @tolgacangoz
- @Ednaordinary
- Add Wan with STG as a community pipeline (#11184)