github huggingface/diffusers v0.38.0
Diffusers 0.38.0: New image and audio pipelines, Core library improvements, and more

12 hours ago

New Pipelines

LLaDA2

LLaDA2 is a family of discrete diffusion language models that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.

Nucleus-MoE

NucleusMoE-Image is a 2B active 17B parameter model trained with efficiency at its core. Our novel architecture highlights the scalability of a sparse MoE architecture for Image generation.

Thanks to @sippycoder for the contribution.

Ernie-Image

ERNIE-Image is a powerful and highly efficient image generation model with 8B parameters.

Thanks to @HsiaWinter for the contribution.

LongCat-AudioDiT

LongCat-AudioDiT is a text-to-audio diffusion model from Meituan LongCat.

Thanks to @RuixiangMa for the contribution.

Ace-Step 1.5

ACE-Step 1.5 generates variable-length stereo audio at 48 kHz (10 seconds to 10 minutes) from text prompts and optional lyrics. The full system pairs a Language Model planner with a Diffusion Transformer (DiT) synthesizer; this pipeline wraps the DiT half of that stack, and consists of three components: an AutoencoderOobleck VAE that compresses waveforms into 25 Hz stereo latents, a Qwen3-based text encoder for prompt and lyric conditioning, and an AceStepTransformer1DModel DiT that operates in the VAE latent space using flow matching.

Thanks to @ChuxiJ for the contribution.

Flux.2 Small Decoder

Make your Flux.2 decoding faster with this new small decoder model from the Black Forest Labs. You can check it out here. It was contributed by @huemin-art in this PR.

Modular Pipeline Support

We added modular support for LTX-2 and Hunyuan 1.5.

Core Library

All commits

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @kashif
    • [Discrete Diffusion] Add LLaDA2 pipeline (#13226)
    • [LLADA2] documentation fixes (#13333)
  • @howardzhang-cv
    • remove str option for quantization config in torchao (#13291)
    • change minimum version guard for torchao to 0.15.0 (#13355)
  • @sippycoder
  • @DN6
    • [CI] Refactor Cosmos Transformer Tests (#13335)
    • [CI] Hunyuan Transformer Tests Refactor (#13342)
    • [CI] Add PR/Issue Auto Labeler (#13380)
    • [CI] Add GLM Image Transformer Model Tests (#13344)
    • [CI] Use finegrained token for Issue Labeler (#13433)
    • [CI] Fix BnB tests (#13481)
  • @akshan-main
    • Cache RoPE freqs on device to avoid repeated CPU-GPU copy in QwenImage (#13406)
    • Fix HunyuanVideo 1.5 I2V by preprocessing image at pixel resolution i… (#13440)
    • [modular] Add LTX Video modular pipeline (#13378)
    • Add modular pipeline for HunyuanVideo 1.5 (#13389)
    • Add Ernie-Image modular pipeline (#13498)
  • @HsiaWinter
    • Add ernie image (#13432)
    • fix some dtype issue for gguf / some gpu backends (#13464)
  • @hlky
  • @RuixiangMa
    • [Feat] Adds LongCat-AudioDiT pipeline (#13390)
    • [chore] Add diffusers-format example to LongCatAudioDiTPipeline (#13483)
    • [Bugfix] Fix shape mismatch in LongCatAudioDiTTransformer conversion (#13494)
  • @adi776borate
    • Add FLUX.2 Klein Inpaint Pipeline (#13050)
    • Fix missing latents_bn_std dtype cast in VAE normalization (#13299)
  • @ChuxiJ
    • Add ACE-Step pipeline for text-to-music generation (#13095)

Don't miss a new diffusers release

NewReleases is sending notifications on new releases.