Shap-E
Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.
We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.
Text to 3D
import torch
from diffusers import ShapEPipeline
from diffusers.utils import export_to_gif
ckpt_id = "openai/shap-e"
pipe = ShapEPipeline.from_pretrained(repo).to("cuda")
guidance_scale = 15.0
prompt = "a shark"
images = pipe(
prompt,
guidance_scale=guidance_scale,
num_inference_steps=64,
size=256,
).images
gif_path = export_to_gif(images, "shark_3d.gif")
Image to 3D
import torch
from diffusers import ShapEImg2ImgPipeline
from diffusers.utils import export_to_gif, load_image
ckpt_id = "openai/shap-e-img2img"
pipe = ShapEImg2ImgPipeline.from_pretrained(repo).to("cuda")
img_url = "https://hf.co/datasets/diffusers/docs-images/resolve/main/shap-e/corgi.png"
image = load_image(img_url)
generator = torch.Generator(device="cuda").manual_seed(0)
batch_size = 4
guidance_scale = 3.0
images = pipe(
image,
num_images_per_prompt=batch_size,
generator=generator,
guidance_scale=guidance_scale,
num_inference_steps=64,
size=256,
output_type="pil"
).images
gif_path = export_to_gif(images, "corgi_sampled_3d.gif")
For more details, check out the official documentation.
The model was added by @yiyixuxu in #3742.
Consistency models
Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.
import torch
from diffusers import ConsistencyModelPipeline
device = "cuda"
# Load the cd_imagenet64_l2 checkpoint.
model_id_or_path = "openai/diffusers-cd_imagenet64_l2"
pipe = ConsistencyModelPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
pipe.to(device)
# Onestep Sampling
image = pipe(num_inference_steps=1).images[0]
image.save("consistency_model_onestep_sample.png")
# Onestep sampling, class-conditional image generation
# ImageNet-64 class label 145 corresponds to king penguins
image = pipe(num_inference_steps=1, class_labels=145).images[0]
image.save("consistency_model_onestep_sample_penguin.png")
# Multistep sampling, class-conditional image generation
# Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.
# https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77
image = pipe(timesteps=[22, 0], class_labels=145).images[0]
image.save("consistency_model_multistep_sample_penguin.png")
For more details, see the official docs.
The model was added by community contributors @dg845 and @ayushtues in #3492.
Stable Diffusion XL 0.9 Research Preview
If you have access to the Stable Diffusion XL 0.9 weights, you can use the following code to experiment with our new StableDiffusionXLPipeline
:
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch
use_refiner = True
pipe = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True)
pipe.to("cuda")
if use_refiner:
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
refiner.to("cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
if use_refiner:
image = refiner(prompt=prompt, image=image[None, :]).images[0]
image.save("image.png")
It was introduced by @patrickvonplaten in #3859.
Video-to-Video
We can now generate watermark-free videos with the new checkpoints:
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()
prompt = "Darth Vader surfing a wave"
video_frames = pipe(prompt, num_frames=24).frames
video_path = export_to_video(video_frames)
For more details, check out the official docs.
It was contributed by @patrickvonplaten in #3900.
All commits
- remove seed by @yiyixuxu in #3734
- Correct Token to upload docs by @patrickvonplaten in #3744
- Correct another push token by @patrickvonplaten in #3745
- [Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
- [Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
- [documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
- Text2video zero refinements by @19and99 in #3733
- [Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
- [MultiControlNet] Allow save and load by @patrickvonplaten in #3747
- Update pipeline_flax_stable_diffusion_controlnet.py by @jfozard in #3306
- update conversion script for Kandinsky unet by @yiyixuxu in #3766
- [docs] Fix Colab notebook cells by @stevhliu in #3777
- [Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
- [Enhance] Update reference by @okotaku in #3723
- Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
- Fix some bad comment in training scripts by @patrickvonplaten in #3798
- Added LoRA loading to
StableDiffusionKDiffusionPipeline
by @tripathiarpan20 in #3751 - UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
- feat: add PR template. by @sayakpaul in #3786
- Ldm3d first PR by @estelleafl in #3668
- Complete set_attn_processor for prior and vae by @patrickvonplaten in #3796
- fix typo by @Isotr0py in #3800
- manual check for checkpoints_total_limit instead of using accelerate by @williamberman in #3681
- [train text to image] add note to loading from checkpoint by @williamberman in #3806
- device map legacy attention block weight conversion by @williamberman in #3804
- [docs] Zero SNR by @stevhliu in #3776
- [ldm3d] Fixed small typo by @estelleafl in #3820
- [Examples] Improve the model card pushed from the
train_text_to_image.py
script by @sayakpaul in #3810 - [Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
- Update control_brightness.mdx by @dqueue in #3825
- Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
- Add ddpm kandinsky by @yiyixuxu in #3783
- [docs] More API stuff by @stevhliu in #3835
- relax tol attention conversion test by @williamberman in #3842
- fix: random module seeding by @sayakpaul in #3846
- fix audio_diffusion tests by @teticio in #3850
- Correct bad attn naming by @patrickvonplaten in #3797
- [Conversion] Small fixes by @patrickvonplaten in #3848
- Fix some audio tests by @patrickvonplaten in #3841
- [Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
- Update Habana Gaudi doc by @regisss in #3863
- Add guidance start/stop by @holwech in #3770
- feat: rename single-letter vars in
resnet.py
by @SauravMaheshkar in #3868 - Fixing the global_step key not found by @VincentNeemie in #3844
- Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
- fix sde add noise typo by @UranusITS in #3839
- [Tests] add test for checking soft dependencies. by @sayakpaul in #3847
- [Enhance] Add LoRA rank args in train_text_to_image_lora by @okotaku in #3866
- [docs] Model API by @stevhliu in #3562
- fix/docs: Fix the broken doc links by @Aisuko in #3897
- Add video img2img by @patrickvonplaten in #3900
- fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
- fix/doc: no import torch issue by @Aisuko in #3923
- Correct controlnet out of list error by @patrickvonplaten in #3928
- Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
- [ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
- Improve memory text to video by @patrickvonplaten in #3930
- revert automatic chunking by @patrickvonplaten in #3934
- avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
- Fix failing np tests by @patrickvonplaten in #3942
- Add
timestep_spacing
andsteps_offset
to schedulers by @pcuenca in #3947 - Add Consistency Models Pipeline by @dg845 in #3492
- Update consistency_models.mdx by @sayakpaul in #3961
- Make
UNet2DConditionOutput
pickle-able by @prathikr in #3857 - [Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
- [Text-to-video] Add
torch.compile()
compatibility by @sayakpaul in #3949 - [SD-XL] Add new pipelines by @patrickvonplaten in #3859
- Kandinsky_v22_yiyi by @yiyixuxu in #3936
- Add Shap-E by @yiyixuxu in #3742
- disable num attenion heads by @patrickvonplaten in #3969
- Improve SD XL by @patrickvonplaten in #3968
- fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @estelleafl
- @AndyShih12
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
- @dg845
- Add Consistency Models Pipeline (#3492)