Shap-E
Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.
We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.
Text to 3D
import torch
from diffusers import ShapEPipeline
from diffusers.utils import export_to_gif
ckpt_id = "openai/shap-e"
pipe = ShapEPipeline.from_pretrained(ckpt_id).to("cuda")
guidance_scale = 15.0
prompt = "A birthday cupcake"
images = pipe(
prompt,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images
gif_path = export_to_gif(images[0], "cake_3d.gif")
Image to 3D
import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image
ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(ckpt_id).to("cuda")
img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png"
image = load_image(img_url)
generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4
guidance_scale = 3.0
images = pipe(
image,
num_images_per_prompt=batch_size,
generator=generator,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size =256,
output_type="pil"
).images
gif_path = export_to_gif(images[0], "burger_sampled_3d.gif")
For more details, check out the official documentation.
The model was contributed by @yiyixuxu in #3742.
Consistency models
Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.
import torch
from diffusers import ConsistencyModelPipeline
device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)
# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")
# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")
# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")
For more details, see the official docs.
The model was contributed by our community members @dg845 and @ayushtues in #3492.
Video-to-Video
Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()
prompt = "Darth Vader surfing a wave"
video_frames = pipe(prompt, num_frames=24).frames
video_path = export_to_video(video_frames)
For more details, check out the official docs.
It was contributed by @patrickvonplaten in #3900.
All commits
- remove seed by @yiyixuxu in #3734
- Correct Token to upload docs by @patrickvonplaten in #3744
- Correct another push token by @patrickvonplaten in #3745
- [Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
- [Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
- [documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
- Text2video zero refinements by @19and99 in #3733
- [Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
- [MultiControlNet] Allow save and load by @patrickvonplaten in #3747
- Update pipeline_flax_stable_diffusion_controlnet.py by @jfozard in #3306
- update conversion script for Kandinsky unet by @yiyixuxu in #3766
- [docs] Fix Colab notebook cells by @stevhliu in #3777
- [Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
- [Enhance] Update reference by @okotaku in #3723
- Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
- Fix some bad comment in training scripts by @patrickvonplaten in #3798
- Added LoRA loading to
StableDiffusionKDiffusionPipeline
by @tripathiarpan20 in #3751 - UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
- feat: add PR template. by @sayakpaul in #3786
- Ldm3d first PR by @estelleafl in #3668
- Complete set_attn_processor for prior and vae by @patrickvonplaten in #3796
- fix typo by @Isotr0py in #3800
- manual check for checkpoints_total_limit instead of using accelerate by @williamberman in #3681
- [train text to image] add note to loading from checkpoint by @williamberman in #3806
- device map legacy attention block weight conversion by @williamberman in #3804
- [docs] Zero SNR by @stevhliu in #3776
- [ldm3d] Fixed small typo by @estelleafl in #3820
- [Examples] Improve the model card pushed from the
train_text_to_image.py
script by @sayakpaul in #3810 - [Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
- Update control_brightness.mdx by @dqueue in #3825
- Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
- Add ddpm kandinsky by @yiyixuxu in #3783
- [docs] More API stuff by @stevhliu in #3835
- relax tol attention conversion test by @williamberman in #3842
- fix: random module seeding by @sayakpaul in #3846
- fix audio_diffusion tests by @teticio in #3850
- Correct bad attn naming by @patrickvonplaten in #3797
- [Conversion] Small fixes by @patrickvonplaten in #3848
- Fix some audio tests by @patrickvonplaten in #3841
- [Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
- Update Habana Gaudi doc by @regisss in #3863
- Add guidance start/stop by @holwech in #3770
- feat: rename single-letter vars in
resnet.py
by @SauravMaheshkar in #3868 - Fixing the global_step key not found by @VincentNeemie in #3844
- Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
- fix sde add noise typo by @UranusITS in #3839
- [Tests] add test for checking soft dependencies. by @sayakpaul in #3847
- [Enhance] Add LoRA rank args in train_text_to_image_lora by @okotaku in #3866
- [docs] Model API by @stevhliu in #3562
- fix/docs: Fix the broken doc links by @Aisuko in #3897
- Add video img2img by @patrickvonplaten in #3900
- fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
- fix/doc: no import torch issue by @Aisuko in #3923
- Correct controlnet out of list error by @patrickvonplaten in #3928
- Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
- [ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
- Improve memory text to video by @patrickvonplaten in #3930
- revert automatic chunking by @patrickvonplaten in #3934
- avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
- Fix failing np tests by @patrickvonplaten in #3942
- Add
timestep_spacing
andsteps_offset
to schedulers by @pcuenca in #3947 - Add Consistency Models Pipeline by @dg845 in #3492
- Update consistency_models.mdx by @sayakpaul in #3961
- Make
UNet2DConditionOutput
pickle-able by @prathikr in #3857 - [Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
- [Text-to-video] Add
torch.compile()
compatibility by @sayakpaul in #3949 - [SD-XL] Add new pipelines by @patrickvonplaten in #3859
- Kandinsky 2.2 by @cene555 in #3903
- Add Shap-E by @yiyixuxu in #3742
- disable num attenion heads by @patrickvonplaten in #3969
- Improve SD XL by @patrickvonplaten in #3968
- fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @estelleafl
- @AndyShih12
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
- @dg845
- Add Consistency Models Pipeline (#3492)