jamiepine/voicebox v0.4.5 on GitHub

Second hotfix for the "offline mode is enabled" crash on model load. 0.4.4 reverted the inference-path offline guards but kept the same trap on the load path, so users who updated to 0.4.4 kept hitting the exact error the release was supposed to fix (#526). This release removes the load-path guards and patches the transformers tokenizer load to be robust to HuggingFace metadata failures at the source, so the class of bug can't recur.

Reliability

Load no longer fails with "offline mode is enabled" (#530, fixes #526). transformers 4.57.x added an unconditional huggingface_hub.model_info() call inside AutoTokenizer.from_pretrained (via _patch_mistral_regex) that runs for every non-local repo load, regardless of cache state or whether the target model is actually a Mistral variant. The load-time HF_HUB_OFFLINE guard from 0.4.2 turned that into a hard crash for cached online users the moment 0.4.4 removed the inference-path guard that had been masking the problem. Fix wraps _patch_mistral_regex so any exception from the HF metadata check is caught and the tokenizer is returned unchanged — matching the success-path behavior for non-Mistral repos. The wrapper installs at backend.backends import time so it covers Qwen Base, Qwen CustomVoice, TADA, and every other transformers-backed engine on Windows, Linux, and CUDA alike. The load-time force_offline_if_cached guards were removed — with the wrapper in place they provide zero value and only risk re-introducing the same failure mode.
No more 30s pause when generating without a network. The HuggingFace metadata timeout called out as a known caveat in 0.4.4 is covered by the same patch; offline users no longer wait for the check to time out before load completes.