invoke-ai/InvokeAI v5.6.0rc1 on GitHub

This release brings a two major improvements to Invoke's memory management: partial model loading (aka Low-VRAM mode) and dynamic memory limits.

Memory Management Improvements

Thanks to @RyanJDick for designing and implementing these improved memory management features!

Partial Model Loading (Low-VRAM mode)

Invoke's previous "all or nothing" model loading strategy required your GPU to have enough VRAM to hold whole models during generation.

As a result, as image generation models increased in size and auxiliary models (e.g. ControlNet) became critical to workflows, Invoke's VRAM requirements have increased at the same rate. The increased VRAM requirements have prevent many of our users from running Invoke with the latest and greatest models.

Partial model loading allows Invoke to load only the parts of the model that are actively being used onto the GPU, substantially reducing Invoke's VRAM requirements.

Applies to systems with a CUDA device.
Enables large models to run with limited GPU VRAM (e.g. Full 24GB FLUX dev on an 8GB GPU)
When models are too large to fit on the GPU, they will be partially offloaded to RAM. The model weights are still streamed to the GPU for fast inference. Inference speed won't be as fast as when a model is fully loaded, but will be much faster than running on the CPU.
The recommended minimum CUDA GPU size is 8GB. An 8GB GPU should now be capable of running all models supported by Invoke (even the full 24GB FLUX models with ControlNet).
If there is sufficient demand, we could probably support 4GB cards in the future by moving the VAE decoding operation fully to the CPU.

Dynamic Memory Limits

Previously, the amount of RAM and VRAM used for model caching were set to hard limits. Now, the amount of RAM and VRAM used is adjusted dynamically based on what's available.

For most users, this will result in more effective use of their RAM/VRAM without having to tune configuration values.

Users can expect:

Faster average model load times on systems with extra memory
Fewer out-of-memory errors when combined with Partial Model Loading

Enabling Partial Model Loading and Dynamic Memory Limits

Partial Model Loading is disabled by default. To enable it, set enable_partial_loading: true in your invokeai.yaml:

enable_partial_loading: true

This is highly recommended for users with limited VRAM. Users with 24GB+ of VRAM may prefer to leave this option disabled to guarantee that models get fully-loaded and run at full speed.

Dynamic memory limits are enabled by default, but can be overridden by setting ram or vram in your invokeai.yaml.

# Override the dynamic cache limits to ram=6GB and vram=20GB.
ram: 6
vram: 20

🚨 Note: Users who previously set ram or vram in their invokeai.yaml will need to delete these overrides in order to benefit from the new dynamic memory limits.

All Changes

Added support for partial model loading.
Added support for dynamic memory limits.
Fixed issue where excessively long board names could cause performance issues.
Reworked error handling when installing models from a URL.
Fixed link to Scale setting's support docs.
Tidied some unused variables. Thanks @rikublock!
Added typegen check to CI pipeline. Thanks @rikublock!
Added stereogram nodes to Community Nodes docs. Thanks @simonfuhrmann!
Updated installation-related docs (quick start, manual install, dev install).

Installing and Updating

The new Invoke Launcher is the recommended way to install, update and run Invoke. It takes care of a lot of details for you - like installing the right version of python - and runs Invoke as a desktop application.

Follow the Quick Start guide to get started with the launcher.

If you already have the launcher, you can use it to update your existing install.

We've just updated the launcher to v1.2.0 with a handful of fixes. To update the launcher itself, download the latest version from the quick start guide - the download links are kept up to date.

Legacy Scripts (not recommended!)

We recommend using the launcher, as described in the previous section!

To install or update with the outdated legacy scripts 😱, download the latest legacy scripts and follow the legacy scripts instructions.

What's Changed

Update Readme with new Installer Instructions by @hipsterusername in #7455
docs: fix installation docs home by @psychedelicious in #7470
docs: fix installation docs home again by @psychedelicious in #7471
feat(ci): add typegen check workflow by @rikublock in #7463
docs: update download links for launcher by @psychedelicious in #7489
Add Stereogram Nodes to communityNodes.md by @simonfuhrmann in #7493
Partial Loading PR1: Tidy ModelCache by @RyanJDick in #7492
Partial Loading PR2: Add utils to support partial loading of models from CPU to GPU by @RyanJDick in #7494
Partial Loading PR3: Integrate 1) partial loading, 2) quantized models, 3) model patching by @RyanJDick in #7500
Correct Scale Informational Popover by @hipsterusername in #7499
docs: install guides by @psychedelicious in #7508
docs: no need to specify version for dev env setup by @psychedelicious in #7510
feat(ui): reset canvas layers only resets the layers by @psychedelicious in #7511
refactor(ui): mm model install error handling by @psychedelicious in #7512
fix(api): limit board_name length to 300 characters by @maryhipp in #7515
fix(app): remove obsolete DEFAULT_PRECISION variable by @rikublock in #7473
Partial Loading PR 3.5: Fix pre-mature model drops from the RAM cache by @RyanJDick in #7522
Partial Loading PR4: Enable partial loading (behind config flag) by @RyanJDick in #7505
Partial Loading PR5: Dynamic cache ram/vram limits by @RyanJDick in #7509
ui: translations update from weblate by @weblate in #7480
chore: bump version to v5.6.0rc1 by @psychedelicious in #7521

New Contributors

@simonfuhrmann made their first contribution in #7493

Full Changelog: v5.5.0...v5.6.0rc1