github huggingface/diffusers v0.28.0
v0.28.0: Marigold, PixArt Sigma, AnimateDiff SDXL, InstantStyle, VQGAN Training Script, and more

latest releases: v0.29.2, v0.29.1, v0.29.0...
one month ago

Diffusion models are known for their abilities in the space of generative modeling. This release of diffusers introduces the first official pipeline (Marigold) for discriminative tasks such as depth estimation and surface normals’ estimation!

Starting this release, we will also highlight the changes and features from the library that make it easy to integrate community checkpoints, features, and so on. Read on!

Marigold

Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.

marigold

(Image taken from the official repository)

The code snippet below shows how to use this pipeline for depth estimation:

import diffusers
import torch

pipe = diffusers.MarigoldDepthPipeline.from_pretrained(
    "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torch_dtype=torch.float16
).to("cuda")

image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg")
depth = pipe(image)

vis = pipe.image_processor.visualize_depth(depth.prediction)
vis[0].save("einstein_depth.png")

depth_16bit = pipe.image_processor.export_depth_to_16bit_png(depth.prediction)
depth_16bit[0].save("einstein_depth_16bit.png")

Check out the API documentation here. We also have a detailed guide about the pipeline here.

Thanks to @toshas, one of the authors of Marigold, who contributed this in #7847.

🌀 Massive Refactor of from_single_file 🌀

We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.

Some of the changes introduced in this refactor:

  1. When loading a single file checkpoint, we will attempt to use the keys present in the checkpoint to infer a model repository on the Hugging Face Hub that we can use to configure the pipeline. For example, if you are using a single file checkpoint based on SD 1.5, we would use the configuration files in the runwayml/stable-diffusion-v1-5 repository to configure the model components and pipeline.
  2. Suppose this inferred configuration isn’t appropriate for your checkpoint. In that case, you can override it using the config argument and pass in either a path to a local model repo or a repo id on the Hugging Face Hub.
pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>) 
  1. Deprecation of model configuration arguments for the from_single_file method in Pipelines such as num_in_channels, scheduler_type , image_size and upcast_attention . This is an anti-pattern that we have supported in previous versions of the library when we assumed that it would only be relevant to Stable Diffusion based models. However, given that there is a demand to support other model types, we feel it is necessary for single-file loading behavior to adhere to the conventions set in our other loading methods. Configuring individual model components through a pipeline loading method is not something we support in from_pretrained, and therefore, we will be deprecating support for this behavior in from_single_file as well.

PixArt Sigma

PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!


(Taken from the project website.)

import torch
from diffusers import PixArtSigmaPipeline

# You can replace the checkpoint id with "PixArt-alpha/PixArt-Sigma-XL-2-512-MS" too.
pipe = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torch_dtype=torch.float16
)
# Enable memory optimizations.
pipe.enable_model_cpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe(prompt).images[0]

📃 Refer to the documentation here to learn more about PixArt Sigma.

Thanks to @lawrence-cj, one of the authors of PixArt Sigma, who contributed this in #7857.

AnimateDiff SDXL

@a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff in #6721. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.

import torch
from diffusers.models import MotionAdapter
from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler
from diffusers.utils import export_to_gif

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torch_dtype=torch.float16)

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
scheduler = DDIMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler",
    clip_sample=False,
    beta_schedule="linear",
    steps_offset=1,
)
pipe = AnimateDiffSDXLPipeline.from_pretrained(
    model_id,
    motion_adapter=adapter,
    scheduler=scheduler,
    torch_dtype=torch.float16,
    variant="fp16",
).enable_model_cpu_offload()

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

output = pipe(
    prompt="a panda surfing in the ocean, realistic, high quality",
    negative_prompt="low quality, worst quality",
    num_inference_steps=20,
    guidance_scale=8,
    width=1024,
    height=1024,
    num_frames=16,
)

frames = output.frames[0]
export_to_gif(frames, "animation.gif")

📜 Refer to the documentation to learn more.

Block-wise LoRA

@UmerHA contributed the support to control the scales of different LoRA blocks in a granular manner in #7352. Depending on the LoRA checkpoint one is using, this granular control can significantly impact the quality of the generated outputs. Following code block shows how this feature can be used while performing inference:

...

adapter_weight_scales = { "unet": { "down": 0, "mid": 1, "up": 0} }
pipe.set_adapters("pixel", adapter_weight_scales)
image = pipe(
		prompt, num_inference_steps=30, generator=torch.manual_seed(0)
).images[0]

✍️ Refer to our documentation for more details and a full-fledged example.

InstantStyle

More granular control of scale could be extended to IP-Adapters too. @DannHuang contributed to the support of InstantStyle, aka granular control of IP-Adapter scales, in #7668. The following code block shows how this feature could be used when performing inference with IP-Adapters:

...

scale = {
    "down": {"block_2": [0.0, 1.0]},
    "up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)

This way, one can generate images following only the style or layout from the image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.

Check out the documentation here.

ControlNetXS

ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.

ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion XL.

Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.

Custom Timesteps

We introduced custom timesteps support for some of our pipelines and schedulers. You can now set your scheduler with a list of arbitrary timesteps. For example, you can use the AYS timesteps schedule to achieve very nice results with only 10 denoising steps.

from diffusers.schedulers import AysSchedules
sampling_schedule = AysSchedules["StableDiffusionXLTimesteps"]
pipe = StableDiffusionXLPipeline.from_pretrained(
    "SG161222/RealVisXL_V4.0",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, algorithm_type="sde-dpmsolver++")
prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up"
image = pipe(prompt=prompt, timesteps=sampling_schedule).images[0]

Check out the documentation here

device_map in Pipelines 🧪

We have introduced experimental support for device_map in our pipelines. This feature becomes relevant when you have multiple accelerators to distribute the components of a pipeline. Currently, we support only “balanced” device_map. However, we plan to support other device mapping strategies relevant to diffusion models in the future.

from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", 
    torch_dtype=torch.float16, 
    device_map="balanced"
)
image = pipeline("a dog").images[0]

In cases where you might be limited to low VRAM accelerators, you can use device_map to benefit from them. Below, we simulate a situation where we have access to two GPUs, each having only a GB of VRAM (through the max_memory argument).

from diffusers import DiffusionPipeline
import torch

max_memory = {0:"1GB", 1:"1GB"}
pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    device_map="balanced",
		max_memory=max_memory
)
image = pipeline("a dog").images[0]

📜 Refer to the documentation to learn more about it.

VQGAN Training Script 📈

VQGAN, proposed in Taming Transformers for High-Resolution Image Synthesis, is a crucial component in the modern generative image modeling toolbox. Once it is trained, its encoder can be leveraged to compute general-purpose tokens from input images.

Thanks to @isamu-isozaki, who contributed a script and related utilities to train VQGANs in #5483. For details, refer to the official training directory.

VideoProcessor Class

Similar to the VaeImageProcessor class, we have introduced a VideoProcessor to help make the preprocessing and postprocessing of videos easier and a little more streamlined across the pipelines. Refer to the documentation to learn more.

New Guides 📑

Starting with this release, we provide guides and tutorials to help users get started with some of the most frequently used tasks in image and video generation. For this release, we have a series of three guides about outpainting with different techniques:

Official Callbacks

We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with SDXLCFGCutoffCallback.

import torch
from diffusers import DiffusionPipeline
from diffusers.callbacks import SDXLCFGCutoffCallback

callback = SDXLCFGCutoffCallback(cutoff_step_ratio=0.4)
pipeline = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")
prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution"
out = pipeline(
    prompt=prompt,
    num_inference_steps=25,
    callback_on_step_end=callback,
)

Read more on our documentation 📜

Community Pipelines and from_pipe API

Starting with this release note, we will highlight the new community pipelines! More and more of our pipelines were added as community pipelines first and graduated as official pipelines once people started to use them a lot! We do not require community pipelines to follow diffusers’ coding style, so it is the easiest way to contribute to diffusers 😊 

We also introduced a from_pipe API that’s very useful for the community pipelines that share checkpoints with our official pipelines and improve generation quality in some way:) You can use from_pipe(...) to load many community pipelines without additional memory requirements. With this API, you can easily switch between different pipelines to apply different techniques.

Read more about from_pipe API in our documentation 📃.

Here are four new community pipelines since our last release.

BoxDiff

BoxDiff lets you use bounding box coordinates for a more controlled generation. Here is an example of how you can apply this technique on a stable diffusion pipeline you had created (i.e. pipe_sd in the below example)

pipe_box = DiffusionPipeline.from_pipe(
    pipe_sd,
    custom_pipeline="pipeline_stable_diffusion_boxdiff",
)
pipe_box.enable_model_cpu_offload()
phrases = ["aurora","reindeer","meadow","lake","mountain"]
boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]]
boxes = [[x / 512 for x in box] for box in boxes]

generator = torch.Generator(device="cpu").manual_seed(42)
images = pipe_box(
    prompt,
    boxdiff_phrases=phrases,
    boxdiff_boxes=boxes,
    boxdiff_kwargs={
        "attention_res": 16,
        "normalize_eot": True
    },
    num_inference_steps=50,
    generator=generator,
).images

Check out this community pipeline here

HD-Painter

HD-Painter can enhance inpainting pipelines with improved prompt faithfulness and generate higher resolution (up to 2k). You can switch from BoxDiff to HD-Painter like this

pipe = DiffusionPipeline.from_pipe(
    pipe_box,
    custom_pipeline="hd_painter"
)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

prompt = "wooden boat"
init_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/images/2.jpg")
mask_image = load_image("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/__assets__/samples/masks/2.png")

image = pipe (prompt, init_image, mask_image, use_rasg = True, use_painta = True, generator=torch.manual_seed(12345)).images[0]

Check out this community pipeline here

Differential Diffusion

Differential Diffusion enables customization of the amount of change per pixel or per image region. It’s very effective in inpainting and outpainting.

pipeline = DiffusionPipeline.from_pipe(
    pipe_sdxl,
    custom_pipeline="pipeline_stable_diffusion_xl_differential_img2img",
).to("cuda")
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=True)

prompt = "a green pear"
negative_prompt = "blurry"

image = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=7.5,
    num_inference_steps=25,
    original_image=image,
    image=image,
    strength=1.0,
    map=mask,
).images[0]

Check out this community pipeline here.

FRESCO

FRESCO aka FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation enables zero-shot video-to-video translation. Learn more about it from here.

All Commits

  • clean dep installation step in push_tests by @sayakpaul in #7382
  • [LoRA test suite] refactor the test suite and cleanse it by @sayakpaul in #7316
  • [Custom Pipelines with Custom Components] fix multiple things by @sayakpaul in #7304
  • Fix typos by @standardAI in #7411
  • fix: enable unet_3d_condition to support time_cond_proj_dim by @yhZhai in #7364
  • add: space within docs to calculate mememory usage. by @sayakpaul (direct commit on v0.28.0-release)
  • Revert "add: space within docs to calculate mememory usage." by @sayakpaul (direct commit on v0.28.0-release)
  • [Docs] add missing output image by @sayakpaul in #7425
  • add a "Community Scripts" section by @yiyixuxu in #7358
  • add: space for calculating memory usagee. by @sayakpaul in #7414
  • [refactor] Fix FreeInit behaviour by @a-r-r-o-w in #7410
  • Remove distutils by @sayakpaul in #7455
  • [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline by @standardAI in #7262
  • [Research Projects] ORPO diffusion for alignment by @sayakpaul in #7423
  • Additional Memory clean up for slow tests by @DN6 in #7436
  • Fix for str_to_bool definition in testing utils by @DN6 in #7461
  • [Docs] Fix typos by @standardAI in #7451
  • Fixed minor error in test_lora_layers_peft.py by @UmerHA in #7394
  • Small ldm3d fix by @estelleafl in #7464
  • [tests] skip dynamo tests when python is 3.12. by @sayakpaul in #7458
  • feat: support DoRA LoRA from community by @sayakpaul in #7371
  • Fix broken link by @salcc in #7472
  • Update train_dreambooth_lora_sd15_advanced.py by @ernestchu in #7433
  • [Training utils] add kohya conversion dict. by @sayakpaul in #7435
  • Fix Tiling in ConsistencyDecoderVAE by @standardAI in #7290
  • diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs by @bghira in #7446
  • Fix missing raise statements in check_inputs by @TonyLianLong in #7473
  • Add device arg to offloading with combined pipelines by @Disty0 in #7471
  • fix torch.compile for multi-controlnet of sdxl inpaint by @yiyixuxu in #7476
  • [chore] make the istructions on fetching all commits clearer. by @sayakpaul in #7474
  • Skip test_lora_fuse_nan on mps by @UmerHA in #7481
  • [Chore] Fix Colab notebook links in README.md by @thliang01 in #7495
  • [Modeling utils chore] import load_model_dict_into_meta only once by @sayakpaul in #7437
  • Improve nightly tests by @sayakpaul in #7385
  • add: a helpful message when quality and repo consistency checks fail. by @sayakpaul in #7475
  • apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) by @bghira in #7447
  • cpu_offload: remove all hooks before offload by @yiyixuxu in #7448
  • Bug fix for controlnetpipeline check_image by @Fantast616 in #7103
  • fix OOM for test_vae_tiling by @yiyixuxu in #7510
  • [Tests] Speed up some fast pipeline tests by @sayakpaul in #7477
  • Memory clean up on all Slow Tests by @DN6 in #7514
  • Implements Blockwise lora by @UmerHA in #7352
  • Quick-Fix for #7352 block-lora by @UmerHA in #7523
  • add Instant id sdxl image2image pipeline by @linoytsaban in #7507
  • Perturbed-Attention Guidance by @HyoungwonCho in #7512
  • Add final_sigma_zero to UniPCMultistep by @Beinsezii in #7517
  • Fix IP Adapter Support for SAG Pipeline by @Stepheni12 in #7260
  • [Community pipeline] Marigold depth estimation update -- align with marigold v0.1.5 by @markkua in #7524
  • Fix typo in CPU offload test by @DN6 in #7542
  • Fix SVD bug (shape of time_context) by @KimbingNg in #7268
  • fix the cpu offload tests by @yiyixuxu in #7544
  • add HD-Painter pipeline by @haikmanukyan in #7520
  • add a from_pipe method to DiffusionPipeline by @yiyixuxu in #7241
  • [Community pipeline] SDXL Differential Diffusion Img2Img Pipeline by @asomoza in #7550
  • Fix FreeU tests by @DN6 in #7540
  • [Release tests] make nightly workflow dispatchable. by @sayakpaul in #7541
  • [Chore] remove class assignments for linear and conv. by @sayakpaul in #7553
  • [Tests] Speed up fast pipelines part II by @sayakpaul in #7521
  • 7529 do not disable autocast for cuda devices by @bghira in #7530
  • add: utility to format our docs too 📜 by @sayakpaul in #7314
  • UniPC Multistep fix tensor dtype/device on order=3 by @Beinsezii in #7532
  • UniPC Multistep add rescale_betas_zero_snr by @Beinsezii in #7531
  • [Core] refactor transformers 2d into multiple init variants. by @sayakpaul in #7491
  • [Chore] increase number of workers for the tests. by @sayakpaul in #7558
  • Update pipeline_animatediff_video2video.py by @AbhinavGopal in #7457
  • Skip test_freeu_enabled on MPS by @UmerHA in #7570
  • [Tests] reduce block sizes of UNet and VAE tests by @sayakpaul in #7560
  • [IF| add set_begin_index for all IF pipelines by @yiyixuxu in #7577
  • Add AudioLDM2 TTS by @tuanh123789 in #5381
  • Allow more arguments to be passed to convert_from_ckpt by @w4ffl35 in #7222
  • [Docs] fix bugs in callback docs by @Adenialzz in #7594
  • Add missing restore() EMA call in train SDXL script by @christopher-beckham in #7599
  • disable test_conversion_when_using_device_map by @yiyixuxu in #7620
  • Multi-image masking for single IP Adapter by @fabiorigano in #7499
  • add utilities for updating diffusers pipeline metadata. by @sayakpaul in #7573
  • [Core] refactor transformer_2d forward logic into meaningful conditions. by @sayakpaul in #7489
  • [Workflows] remove installation of libsndfile1-dev and libgl1 from workflows by @sayakpaul in #7543
  • [Core] add "balanced" device_map support to pipelines by @sayakpaul in #6857
  • add the option of upsample function for tiny vae by @IDKiro in #7604
  • [docs] remove duplicate tip block. by @sayakpaul in #7625
  • Modularize instruct_pix2pix SD inferencing during and after training in examples by @satani99 in #7603
  • [Tests] reduce the model sizes in the SD fast tests by @sayakpaul in #7580
  • [docs] Prompt enhancer by @stevhliu in #7565
  • [docs] T2I by @stevhliu in #7623
  • Fix cpu offload related slow tests by @yiyixuxu in #7618
  • [Core] fix img2img pipeline for Playground by @sayakpaul in #7627
  • Skip PEFT LoRA Scaling if the scale is 1.0 by @stevenjlm in #7576
  • LCM Distill Scripts Fix Bug when Initializing Target U-Net by @dg845 in #6848
  • Fixed YAML loading. by @YiqinZhao in #7579
  • fix: Replaced deprecated logger.warn with logger.warning by @Sai-Suraj-27 in #7643
  • FIX Setting device for DoRA parameters by @BenjaminBossan in #7655
  • Add (Scheduled) Pseudo-Huber Loss training scripts to research projects by @kabachuha in #7527
  • make docker-buildx mandatory. by @sayakpaul in #7652
  • fix: metadata token by @sayakpaul in #7631
  • don't install peft from the source with uv for now. by @sayakpaul in #7679
  • Fixing implementation of ControlNet-XS by @UmerHA in #6772
  • [Core] is_cosxl_edit arg in SDXL ip2p. by @sayakpaul in #7650
  • [Docs] Add TGATE in section optimization by @WentianZhang-ML in #7639
  • fix: Updated ruff configuration to avoid deprecated configuration warning by @Sai-Suraj-27 in #7637
  • Don't install PEFT with UV in slow tests by @DN6 in #7697
  • [Workflows] remove installation of redundant modules from flax PR tests by @sayakpaul in #7662
  • [Docs] Update TGATE in section optimization. by @WentianZhang-ML in #7698
  • [docs] Pipeline loading by @stevhliu in #7684
  • Add tailscale action to push_test by @glegendre01 in #7709
  • Move IP Adapter Face ID to core by @fabiorigano in #7186
  • adding back test_conversion_when_using_device_map by @yiyixuxu in #7704
  • Cast height, width to int inside prepare latents by @DN6 in #7691
  • Cleanup ControlnetXS by @DN6 in #7701
  • fix: Fixed type annotations for compatability with python 3.8 by @Sai-Suraj-27 in #7648
  • fix/add tailscale key in case of failure by @glegendre01 in #7719
  • Animatediff Controlnet Community Pipeline IP Adapter Fix by @AbhinavGopal in #7413
  • Update Wuerschten Test by @DN6 in #7700
  • Fix Kandinksy V22 tests by @DN6 in #7699
  • [docs] AutoPipeline by @stevhliu in #7714
  • Remove redundant lines by @philipbutler in #7396
  • Support InstantStyle by @DannHuang in #7668
  • Restore AttnProcessor2_0 in unload_ip_adapter by @fabiorigano in #7727
  • fix: Fixed a wrong decorator by modifying it to @classmethod by @Sai-Suraj-27 in #7653
  • [Metadat utils] fix: json lines ordering. by @sayakpaul in #7744
  • [docs] Clean up toctree by @stevhliu in #7715
  • Fix failing VAE tiling test by @DN6 in #7747
  • Fix test for consistency decoder. by @DN6 in #7746
  • PixArt-Sigma Implementation by @lawrence-cj in #7654
  • [PixArt] fix small nits in pixart sigma by @sayakpaul in #7767
  • [Tests] mark UNetControlNetXSModelTests::test_forward_no_control to be flaky by @sayakpaul in #7771
  • Fix lora device test by @sayakpaul in #7738
  • [docs] Reproducible pipelines by @stevhliu in #7769
  • [docs] Refactor image quality docs by @stevhliu in #7758
  • Convert RGB to BGR for the SDXL watermark encoder by @btlorch in #7013
  • [docs] Fix AutoPipeline docstring by @stevhliu in #7779
  • Add PixArtSigmaPipeline to AutoPipeline mapping by @Beinsezii in #7783
  • [Docs] Update image masking and face id example by @fabiorigano in #7780
  • Add DREAM training by @AmericanPresidentJimmyCarter in #6381
  • [Scheduler] introduce sigma schedule. by @sayakpaul in #7649
  • Update InstantStyle usage in IP-Adapter documentation by @DannHuang in #7806
  • Check for latents, before calling prepare_latents - sdxlImg2Img by @nileshkokane01 in #7582
  • Add debugging workflow by @DN6 in #7778
  • [Pipeline] Fix error of SVD pipeline when num_videos_per_prompt > 1 by @wuyushuwys in #7786
  • Safetensor loading in AnimateDiff conversion scripts by @DN6 in #7764
  • Adding TextualInversionLoaderMixin for the controlnet_inpaint_sd_xl pipeline by @jschoormans in #7288
  • Added get_velocity function to EulerDiscreteScheduler. by @RuiningLi in #7733
  • Set main_input_name in StableDiffusionSafetyChecker to "clip_input" by @clinty in #7500
  • [Tests] reduce the model size in the ddim fast test by @ariG23498 in #7803
  • [Tests] reduce the model size in the ddpm fast test by @ariG23498 in #7797
  • [Tests] reduce the model size in the amused fast test by @ariG23498 in #7804
  • [Core] introduce _no_split_modules to ModelMixin by @sayakpaul in #6396
  • Add B-Lora training option to the advanced dreambooth lora script by @linoytsaban in #7741
  • SSH Runner Workflow Update by @DN6 in #7822
  • Fix CPU offload in docstring by @standardAI in #7827
  • [docs] Community pipelines by @stevhliu in #7819
  • Fix for pipeline slow test fetcher by @DN6 in #7824
  • [Tests] fix: device map tests for models by @sayakpaul in #7825
  • update the logic of is_sequential_cpu_offload by @yiyixuxu in #7788
  • [ip-adapter] fix ip-adapter for StableDiffusionInstructPix2PixPipeline by @yiyixuxu in #7820
  • [Tests] reduce the model size in the audioldm fast test by @ariG23498 in #7833
  • Fix key error for dictionary with randomized order in convert_ldm_unet_checkpoint by @yunseongcho in #7680
  • Fix hanging pipeline fetching by @DN6 in #7837
  • Update download diff format tests by @DN6 in #7831
  • Update CI cache by @DN6 in #7832
  • move to new runners by @glegendre01 in #7839
  • Change GPU Runners by @glegendre01 in #7840
  • Update deps for pipe test fetcher by @DN6 in #7838
  • [Tests] reduce the model size in the blipdiffusion fast test by @ariG23498 in #7849
  • Respect resume_download deprecation by @Wauplin in #7843
  • Remove installing python again in container by @DN6 in #7852
  • Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. by @HelloWorldBeginner in #7816
  • [docs] LCM by @stevhliu in #7829
  • Ci - change cache folder by @glegendre01 in #7867
  • [docs] Distilled inference by @stevhliu in #7834
  • Fix for "no lora weight found module" with some loras by @asomoza in #7875
  • 7879 - adjust documentation to use naruto dataset, since pokemon is now gated by @bghira in #7880
  • Modification on the PAG community pipeline (re) by @HyoungwonCho in #7876
  • Fix image upcasting by @standardAI in #7858
  • Check shape and remove deprecated APIs in scheduling_ddpm_flax.py by @ppham27 in #7703
  • [Pipeline] AnimateDiff SDXL by @a-r-r-o-w in #6721
  • fix offload test by @yiyixuxu in #7868
  • Allow users to save SDXL LoRA weights for only one text encoder by @dulacp in #7607
  • Remove dead code and fix f-string issue by @standardAI in #7720
  • Fix several imports by @standardAI in #7712
  • [Refactor] Better align from_single_file logic with from_pretrained by @DN6 in #7496
  • [Tests] fix things after #7013 by @sayakpaul in #7899
  • Set max parallel jobs on slow test runners by @DN6 in #7878
  • fix _optional_components in StableCascadeCombinedPipeline by @yiyixuxu in #7894
  • [scheduler] support custom timesteps and sigmas by @yiyixuxu in #7817
  • upgrade to python 3.10 in the Dockerfiles by @sayakpaul in #7893
  • add missing image processors to the docs by @sayakpaul in #7910
  • [Core] introduce videoprocessor. by @sayakpaul in #7776
  • #7535 Update FloatTensor type hints to Tensor by @vanakema in #7883
  • fix bugs when using deepspeed in sdxl by @HelloWorldBeginner in #7917
  • add custom sigmas and timesteps for StableDiffusionXLControlNet pipeline by @neuron-party in #7913
  • fix: Fixed a wrong link to supported python versions in contributing.md file by @Sai-Suraj-27 in #7638
  • [Core] fix offload behaviour when device_map is enabled. by @sayakpaul in #7919
  • Add Ascend NPU support for SDXL. by @HelloWorldBeginner in #7916
  • Official callbacks by @asomoza in #7761
  • fix AnimateDiff creation with a unet loaded with IP Adapter by @fabiorigano in #7791
  • [LoRA] Fix LoRA tests (side effects of RGB ordering) part ii by @sayakpaul in #7932
  • fix multicontrolnet save_pretrained logic for compatibility by @rebel-kblee in #7821
  • Update requirements.txt for text_to_image by @ktakita1011 in #7892
  • Bump transformers from 4.36.0 to 4.38.0 in /examples/research_projects/realfill by @dependabot[bot] in #7635
  • fix VAE loading issue in train_dreambooth by @bssrdf in #7632
  • Expansion proposal of diffusers-cli env by @standardAI in #7403
  • update to use hf-workflows for reporting the Docker build statuses by @sayakpaul in #7938
  • [Core] separate the loading utilities in modeling similar to pipelines. by @sayakpaul in #7943
  • Fix added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline by @detkov in #7924
  • [Pipeline] Adding BoxDiff to community examples by @zjysteven in #7947
  • [tests] decorate StableDiffusion21PipelineSingleFileSlowTests with slow. by @sayakpaul in #7941
  • Adding VQGAN Training script by @isamu-isozaki in #5483
  • move to GH hosted M1 runner by @glegendre01 in #7949
  • [Workflows] add a workflow that can be manually triggered on a PR. by @sayakpaul in #7942
  • refactor: Refactored code by Merging isinstance calls by @Sai-Suraj-27 in #7710
  • Fix the text tokenizer name in logger warning of PixArt pipelines by @liang-hou in #7912
  • Fix AttributeError in train_lcm_distill_lora_sdxl_wds.py by @jainalphin in #7923
  • Consistent SDXL Controlnet callback tensor inputs by @asomoza in #7958
  • remove unsafe workflow. by @sayakpaul in #7967
  • [tests] fix Pixart Sigma tests by @sayakpaul in #7966
  • Fix typo in "attention" by @jacobmarks in #7977
  • Update pipeline_controlnet_inpaint_sd_xl.py by @detkov in #7983
  • [docs] add doc for PixArtSigmaPipeline by @lawrence-cj in #7857
  • Passing cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline by @AlexeyZhuravlev in #7961
  • fix: Fixed few docstrings according to the Google Style Guide by @Sai-Suraj-27 in #7717
  • Make VAE compatible to torch.compile() by @rootonchair in #7984
  • [docs] VideoProcessor by @stevhliu in #7965
  • Use HF_TOKEN env var in CI by @Wauplin in #7993
  • fix: Attribute error in Logger object (logger.warning) by @AMohamedAakhil in #8183
  • Remove unnecessary single file tests for SD Cascade UNet by @DN6 in #7996
  • Fix resize issue in SVD pipeline with VideoProcessor by @DN6 in #8229
  • Create custom container for doc builder by @DN6 in #8263
  • Use freedesktop_os_release() in diffusers cli for Python >=3.10 by @DN6 in #8235
  • [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation by @SingleZombie in #8239
  • [Chore] run the documentation workflow in a custom container. by @sayakpaul in #8266
  • Respect resume_download deprecation V2 by @Wauplin in #8267
  • Clean up from_single_file docs by @DN6 in #8268
  • sampling bug fix in diffusers tutorial "basic_training.md" by @yue-here in #8223
  • Fix a grammatical error in the raise messages by @standardAI in #8272
  • Fix CPU Offloading Usage & Typos by @standardAI in #8230
  • Add details about 1-stage implementation in I2VGen-XL docs by @dhaivat1729 in #8282
  • [Workflows] add a more secure way to run tests from a PR. by @sayakpaul in #7969
  • Add zip package to doc builder image by @DN6 in #8284
  • [Pipeline] Marigold depth and normals estimation by @toshas in #7847
  • Release: v0.28.0 by @sayakpaul (direct commit on v0.28.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @standardAI
    • Fix typos (#7411)
    • [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)
    • [Docs] Fix typos (#7451)
    • Fix Tiling in ConsistencyDecoderVAE (#7290)
    • Fix CPU offload in docstring (#7827)
    • Fix image upcasting (#7858)
    • Remove dead code and fix f-string issue (#7720)
    • Fix several imports (#7712)
    • Expansion proposal of diffusers-cli env (#7403)
    • Fix a grammatical error in the raise messages (#8272)
    • Fix CPU Offloading Usage & Typos (#8230)
  • @a-r-r-o-w
    • [refactor] Fix FreeInit behaviour (#7410)
    • [Pipeline] AnimateDiff SDXL (#6721)
  • @UmerHA
    • Fixed minor error in test_lora_layers_peft.py (#7394)
    • Skip test_lora_fuse_nan on mps (#7481)
    • Implements Blockwise lora (#7352)
    • Quick-Fix for #7352 block-lora (#7523)
    • Skip test_freeu_enabled on MPS (#7570)
    • Fixing implementation of ControlNet-XS (#6772)
  • @bghira
    • diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs (#7446)
    • apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) (#7447)
    • 7529 do not disable autocast for cuda devices (#7530)
    • 7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)
  • @HyoungwonCho
    • Perturbed-Attention Guidance (#7512)
    • Modification on the PAG community pipeline (re) (#7876)
  • @haikmanukyan
    • add HD-Painter pipeline (#7520)
  • @fabiorigano
    • Multi-image masking for single IP Adapter (#7499)
    • Move IP Adapter Face ID to core (#7186)
    • Restore AttnProcessor2_0 in unload_ip_adapter (#7727)
    • [Docs] Update image masking and face id example (#7780)
    • fix AnimateDiff creation with a unet loaded with IP Adapter (#7791)
  • @kabachuha
    • Add (Scheduled) Pseudo-Huber Loss training scripts to research projects (#7527)
  • @lawrence-cj
    • PixArt-Sigma Implementation (#7654)
    • [docs] add doc for PixArtSigmaPipeline (#7857)
  • @vanakema
    • #7535 Update FloatTensor type hints to Tensor (#7883)
  • @zjysteven
    • [Pipeline] Adding BoxDiff to community examples (#7947)
  • @isamu-isozaki
    • Adding VQGAN Training script (#5483)
  • @SingleZombie
    • [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (#8239)
  • @toshas
    • [Pipeline] Marigold depth and normals estimation (#7847)

Don't miss a new diffusers release

NewReleases is sending notifications on new releases.