Shap-E

Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.

We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.

Text to 3D

import torch
from diffusers import ShapEPipeline
from diffusers.utils import export_to_gif

ckpt_id = "openai/shap-e"
pipe = ShapEPipeline.from_pretrained(ckpt_id).to("cuda")

guidance_scale = 15.0
prompt = "A birthday cupcake"
images = pipe(
    prompt,
    guidance_scale=guidance_scale,
    num_inference_steps=64,
    frame_size=256,
).images

gif_path = export_to_gif(images[0], "cake_3d.gif")

Image to 3D

import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image

ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(ckpt_id).to("cuda")

img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"
image = load_image(img_url)

generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4
guidance_scale = 3.0

images = pipe(
    image, 
    num_images_per_prompt=batch_size, 
    generator=generator, 
    guidance_scale=guidance_scale,
    num_inference_steps=64, 
    frame_size =256, 
    output_type="pil"
).images

gif_path = export_to_gif(images[0], "burger_sampled_3d.gif")

Original image

Generated

For more details, check out the official documentation.

The model was contributed by @yiyixuxu in #3742.

Consistency models

Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.

import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)

# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")

# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")

# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")

For more details, see the official docs.

The model was contributed by our community members @dg845 and @ayushtues in #3492.

Video-to-Video

Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:

import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video

pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()

# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()

prompt = "Darth Vader surfing a wave"
video_frames = pipe(prompt, num_frames=24).frames
video_path = export_to_video(video_frames)

For more details, check out the official docs.

It was contributed by @patrickvonplaten in #3900.

All commits

remove seed by @yiyixuxu in #3734
Correct Token to upload docs by @patrickvonplaten in #3744
Correct another push token by @patrickvonplaten in #3745
[Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
[Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
[documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
Text2video zero refinements by @19and99 in #3733
[Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
[MultiControlNet] Allow save and load by @patrickvonplaten in #3747
Update pipeline_flax_stable_diffusion_controlnet.py by @jfozard in #3306
update conversion script for Kandinsky unet by @yiyixuxu in #3766
[docs] Fix Colab notebook cells by @stevhliu in #3777
[Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
[Enhance] Update reference by @okotaku in #3723
Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
Fix some bad comment in training scripts by @patrickvonplaten in #3798
Added LoRA loading to StableDiffusionKDiffusionPipeline by @tripathiarpan20 in #3751
UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
feat: add PR template. by @sayakpaul in #3786
Ldm3d first PR by @estelleafl in #3668
Complete set_attn_processor for prior and vae by @patrickvonplaten in #3796
fix typo by @Isotr0py in #3800
manual check for checkpoints_total_limit instead of using accelerate by @williamberman in #3681
[train text to image] add note to loading from checkpoint by @williamberman in #3806
device map legacy attention block weight conversion by @williamberman in #3804
[docs] Zero SNR by @stevhliu in #3776
[ldm3d] Fixed small typo by @estelleafl in #3820
[Examples] Improve the model card pushed from the train_text_to_image.py script by @sayakpaul in #3810
[Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
[Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
Update control_brightness.mdx by @dqueue in #3825
Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
Add ddpm kandinsky by @yiyixuxu in #3783
[docs] More API stuff by @stevhliu in #3835
relax tol attention conversion test by @williamberman in #3842
fix: random module seeding by @sayakpaul in #3846
fix audio_diffusion tests by @teticio in #3850
Correct bad attn naming by @patrickvonplaten in #3797
[Conversion] Small fixes by @patrickvonplaten in #3848
Fix some audio tests by @patrickvonplaten in #3841
[Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
Update Habana Gaudi doc by @regisss in #3863
Add guidance start/stop by @holwech in #3770
feat: rename single-letter vars in resnet.py by @SauravMaheshkar in #3868
Fixing the global_step key not found by @VincentNeemie in #3844
Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
fix sde add noise typo by @UranusITS in #3839
[Tests] add test for checking soft dependencies. by @sayakpaul in #3847
[Enhance] Add LoRA rank args in train_text_to_image_lora by @okotaku in #3866
[docs] Model API by @stevhliu in #3562
fix/docs: Fix the broken doc links by @Aisuko in #3897
Add video img2img by @patrickvonplaten in #3900
fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
fix/doc: no import torch issue by @Aisuko in #3923
Correct controlnet out of list error by @patrickvonplaten in #3928
Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
[ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
Improve memory text to video by @patrickvonplaten in #3930
revert automatic chunking by @patrickvonplaten in #3934
avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
Fix failing np tests by @patrickvonplaten in #3942
Add timestep_spacing and steps_offset to schedulers by @pcuenca in #3947
Add Consistency Models Pipeline by @dg845 in #3492
Update consistency_models.mdx by @sayakpaul in #3961
Make UNet2DConditionOutput pickle-able by @prathikr in #3857
[Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
[Text-to-video] Add torch.compile() compatibility by @sayakpaul in #3949
[SD-XL] Add new pipelines by @patrickvonplaten in #3859
Kandinsky 2.2 by @cene555 in #3903
Add Shap-E by @yiyixuxu in #3742
disable num attenion heads by @patrickvonplaten in #3969
Improve SD XL by @patrickvonplaten in #3968
fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@estelleafl
- Ldm3d first PR (#3668)
- [ldm3d] Fixed small typo (#3820)
- [ldm3d] Update code to be functional with the new checkpoints (#3875)
@AndyShih12
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
@dg845
- Add Consistency Models Pipeline (#3492)

huggingface/diffusers v0.18.0 Shap-E, Consistency Models, Video2Video on GitHub