github huggingface/diffusers v0.36.0
Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more πŸŽ„

5 hours ago

The release features a number of new image and video pipelines, a new caching method, a new training script, new kernels - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully πŸš€

New image pipelines

  • Flux2: Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.
  • Z-Image: Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in #12703.
  • QwenImage Edit Plus: It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in #12357.
  • Bria FIBO: FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in #12545.
  • Kandinsky Image Lite: Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in #12664.
  • ChronoEdit: ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in #12593.

New video pipelines

  • Sana-Video: Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in #12634.
  • Kandinsky 5: Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in #12478.
  • Hunyuan 1.5: HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
  • Wan Animate: Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.

New kernels-powered attention backends

The kernels library helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new kernels-powered attention backends:

  • Flash Attention 3 (+ its varlen variant)
  • Flash Attention 2 (+ its varlen variant)
  • SAGE

This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:

# Make sure you have `kernels` installed: `pip install kernels`.
# You can choose `flash_hub` or `sage_hub`, too.
pipe.transformer.set_attention_backend("_flash_3_hub")

For more details, check out the documentation.

TaylorSeer cache

TaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in #12648.

New training script

Our Flux.2 integration features a LoRA fine-tuning script that you can check out here. We provide a number of optimizations to help make it run on consumer GPUs.

Misc

  • Reusing AttentionMixin: Making certain compatible models subclass from the AttentionMixin class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out the #12463.
  • Diffusers backend in SGLang: sgl-project/sglang#14112
  • We started the Diffusers MVP program to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.

All commits

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @yiyixuxu
    • ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
    • Fix: Add _skip_keys for AutoencoderKLWan (#12523)
    • HunyuanImage21 (#12333)
    • [modular] better warn message (#12573)
    • [modular]pass hub_kwargs to load_config (#12577)
    • [modular] wan! (#12611)
    • fix copies (#12637)
    • fix dispatch_attention_fn check (#12636)
    • [modular] add a check (#12628)
    • Hunyuanvideo15 (#12696)
    • [HunyuanVideo1.5] support step-distilled (#12802)
  • @leffff
    • Kandinsky 5 is finally in Diffusers! (#12478)
    • Kandinsky 5 10 sec (NABLA suport) (#12520)
    • Kandinsky 5.0 Docs fixes (#12582)
    • Kandinsky 5.0 Video Pro and Image Lite (#12664)
  • @dg845
    • Remove Qwen Image Redundant RoPE Cache (#12452)
    • [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)
    • Update Wan Animate Docs (#12658)
    • Revert AutoencoderKLWan's dim_mult default value back to list (#12640)
  • @DN6
    • Raise warning instead of error when imports are missing for custom code (#12513)
    • Handle deprecated transformer classes (#12517)
    • Deprecate Stable Cascade (#12537)
    • [Pipelines] Enable Wan VACE to run since single transformer (#12428)
    • [Modular] Fix for custom block kwargs (#12561)
    • [Modular] Allow custom blocks to be saved to local_dir (#12381)
    • Fix custom code loading in Automodel (#12571)
    • [Modular] Allow ModularPipeline to load from revisions (#12592)
    • [Modular] Some clean up for Modular tests (#12579)
    • [CI] Push test fix (#12617)
    • [CI] Fix typo in uv install (#12618)
    • Fix Context Parallel validation checks (#12446)
    • [Modular] Clean up docs (#12604)
    • [CI] Remove unittest dependency from testing_utils.py (#12621)
    • [Modular] Add Custom Blocks guide to doc (#12339)
    • [CI] Make CI logs less verbose (#12674)
    • [CI] Temporarily pin transformers (#12677)
    • [CI] Fix indentation issue in workflow files (#12685)
    • [CI] Fix failing Pipeline CPU tests (#12681)
    • [Modular] Add single file support to Modular (#12383)
    • Deprecate upcast_vae in SDXL based pipelines (#12619)
  • @DavidBert
    • Add Photon model and pipeline support (#12456)
    • Prx (#12525)
    • Rope in float32 for mps or npu compatibility (#12665)
    • [PRX pipeline]: add 1024 resolution ratio bins (#12670)
    • PRX Set downscale_freq_shift to 0 for consistency with internal implementation (#12791)
  • @galbria
    • Bria fibo (#12545)
    • Rename BriaPipeline to BriaFiboPipeline in documentation (#12758)
  • @lawrence-cj
    • [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)
    • SANA-Video Image to Video pipeline SanaImageToVideoPipeline support (#12634)
    • fix typo in docs (#12675)
  • @zhangjiewu
  • @delmalih
    • Improve docstrings and type hints in scheduling_amused.py (#12623)
    • Improve docstrings and type hints in scheduling_ddim.py (#12622)
    • Improve docstrings and type hints in scheduling_ddpm.py (#12651)
    • Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)
    • Improve docstrings and type hints in scheduling_pndm.py (#12676)
    • Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)
    • Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)
    • [Docs] Update Imagen Video paper link in schedulers (#12724)
    • Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)
    • Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)
    • Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)
    • Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)
  • @pratim4dasude
    • Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)
  • @JerryWu-code
    • Add Support for Z-Image Series (#12703)
    • Support unittest for Z-image ⚑️ (#12715)
    • Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. (#12770)
  • @CalamitousFelicitousness
    • Add ZImage LoRA support and integrate into ZImagePipeline (#12750)
    • Add ZImageImg2ImgPipeline (#12751)
  • @DoctorKey
    • Add support for Ovis-Image (#12740)

Don't miss a new diffusers release

NewReleases is sending notifications on new releases.