invoke-ai/InvokeAI v4.2.5 on GitHub

🚨 macOS users may get black images when using LoRAs or IP Adapters. Users with CUDA GPUs may get unexpected OOMs. We are investigating. 🚨

v4.2.5 includes a handful of fixes and improvements, plus one exciting beta node - tiled upscaling via MultiDiffusion.

If you missed v4.2.0, please review its release notes to get up to speed on Control Layers.

Tiled Upscaling via `MultiDiffusion`

MultiDiffusion is a fairly straightforward technique for tiled denoising. The gist is similar to other tiled upscaling methods - split the input image up in to tiles, process each independently, and stitch them back together. The main innovation for MultiDiffusion is to do this in latent space, blending the tensors together continually. This results in excellent consistency across the output image, with no seams.

This feature is exposed as a Tiled MultiDiffusion Denoise Latents node, currently classified as a beta version. It works much the same as the OG Denoise Latents node. Here's a workflow to get you started: sd15_multi_diffusion_esrgan_x2_upscale.json

We are still thinking about to expose this in the linear UI. Most likely, we expose this with very minimal settings. If you want to tweak it, use the workflow.

How to use it

This technique is fundamentally the same as normal img2img. Appropriate use of conditioning and control will greatly improve the output. The one hard requirement is to use the Tile ControlNet model.

Besides that, here are some tips from our initial testing:

Use a detail-adding or style LoRAs.
Use a base model best suited for the desired output style.
Prompts make a difference.
The initial upscaling method makes a difference.
Scheduler makes a difference. Some produce softer outputs.

VRAM Usage

This technique can upscale images to very large sizes without substantially increasing VRAM usage beyond what you'd see for a "normal" sized generation. The VRAM bottlenecks then become the first VAE encode (Image to Latents) and final VAE decode (Latents to Image) steps.

You may run into OOM errors during these steps. The solution is to enable tiling using the toggle on the Image to Latents and Latents to Image nodes. This allows the VAE operations to be done piecewise, similar to the tiled denoising process, without using gobs of VRAM.

There's one caveat - VAE tiling often introduces inconsistency across tiles. Textures and colors may differ from tile to tile. This is a function of the diffusers handling of VAE tiling, not the tiled denoising process introduced in v4.2.5. We are investigating ways to improve this.

Takeaway: If your GPU can handle non-tiled VAE encode and decode for a given output size, use that for best results.

📈 Patch Nodes for v4.2.5

Enhancements

When downloading image metadata, graphs or workflows, the JSON file includes the image name and type of data. Thanks @jstnlowe!
Add clear_queue_on_startup config setting to clear problematic queues. This is useful for a rare edge case where your queue is full of items that somehow crash the app. Set this to true, and the queue will clear before it has time to attempt to execute the problematic item. Thanks @steffy-lo!
Performance and memory efficiency improvements for LoRA patching and model offloading.
Addition of a simplified model installation methods to the Invocation API: download_and_cache_model, load_local_model and load_remote_model. These methods allow models to be used without needing them to be added to the model manager. For example, we are now using these methods to load ESRGAN models.
Support for probing and loading SDXL VAE checkpoint.

Fixes

Fix handling handling of 0-step denoising process.
If a control image's processed version is missing when the app loads, it is now re-processed.

Performance improvements

Improved LoRA patching.
Improved RAM <-> VRAM model transfer performance.

Internal changes

The DenoiseLatentsInvocation has had its internal methods split up to support tiled upscaling via MultiDiffusion. This included some amount of file shuffling and renaming. The invokeai package's exported classes should still be the same. Please let us know if this has broken an import for you.

💾 Installation and Updating

To install or update to v4.2.5, download the installer and follow the installation instructions.

To update, select the same installation location. Your user data (images, models, etc) will be retained.

Missing models after updating from v3 to v4

See this FAQ.

Error during installation `ModuleNotFoundError: No module named 'controlnet_aux'`

See this FAQ

What's Changed

Prefixed JSON filenames with the image UUID by @jstnlowe in #6486
feat(ui): control layers internals cleanup by @psychedelicious in #6487
LoRA patching optimization by @lstein in #6439
fix(ui): re-process control image if processed image is missing on page load by @psychedelicious in #6494
Split up latent.py (code reorganization, no functional changes) by @RyanJDick in #6491
Add simplified model manager install API to InvocationContext by @lstein in #6132
fix: Some imports from previous PR's by @blessedcoolant in #6501
Improve RAM<->VRAM memory copy performance in LoRA patching and elsewhere by @lstein in #6490
Fix DEFAULT_PRECISION handling by @RyanJDick in #6492
added route to install huggingface models from model marketplace by @chainchompa in #6515
Model hash validator by @brandonrising in #6520
Tidy SilenceWarnings context manager by @RyanJDick in #6493
[#6333] Add clear_queue_on_startup config to clear problematic queues by @steffy-lo in #6502
[MM] Add support for probing and loading SDXL VAE checkpoint files by @lstein in #6524
Add TiledMultiDiffusionDenoiseLatents invocation (for upscaling workflows) by @RyanJDick in #6522
Update prevention exception message by @hipsterusername in #6543
Fix handling handling of 0-step denoising process by @RyanJDick in #6544
chore: bump version v4.2.5 by @psychedelicious in #6547

New Contributors

@jstnlowe made their first contribution in #6486

Full Changelog: v4.2.4...v4.2.5