Release v5.5.0
New Model additions
Gemma4
Gemma 4 is a multimodal model with pretrained and instruction-tuned variants, available in 1B, 13B, and 27B parameters. The architecture is mostly the same as the previous Gemma versions. The key differences are a vision processor that can output images of fixed token budget and a spatial 2D RoPE to encode vision-specific information across height and width axis.
You can find all the original Gemma 4 checkpoints under the Gemma 4 release.
The key difference from previous Gemma releases is the new design to process images of different sizes using a fixed-budget number of tokens. Unlike many models that squash every image into a fixed square (like 224×224), Gemma 4 keeps the image's natural aspect ratio while making it the right size. There a a couple constraints to follow:
- The total number of pixels must fit within a patch budget
- Both height and width must be divisible by 48 (= patch size 16 × pooling kernel 3)
Important
Gemma 4 does not apply the standard ImageNet mean/std normalization that many other vision models use. The model's own patch embedding layer handles the final scaling internally (shifting values to the [-1, 1] range).
The number of "soft tokens" (aka vision tokens) an image processor can produce is configurable. The supported options are outlined below and the default is 280 soft tokens per image.
| Soft Tokens | Patches (before pooling) | Approx. Image Area |
|---|---|---|
| 70 | 630 | ~161K pixels |
| 140 | 1,260 | ~323K pixels |
| 280 | 2,520 | ~645K pixels |
| 560 | 5,040 | ~1.3M pixels |
| 1,120 | 10,080 | ~2.6M pixels |
To encode positional information for each patch in the image, Gemma 4 uses a learned 2D position embedding table. The position table stores up to 10,240 positions per axis, which allows the model to handle very large images. Each position is a learned vector of the same dimensions as the patch embedding. The 2D RoPE which Gemma 4 uses independently rotate half the attention head dimensions for the x-axis and the other half for the y-axis. This allows the model to understand spatial relationships like "above," "below," "left of," and "right of."
NomicBERT
NomicBERT is a BERT-inspired encoder model that applies Rotary Position Embeddings (RoPE) to create reproducible long context text embeddings. It is the first fully reproducible, open-source text embedding model with 8192 context length that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on short-context MTEB and long context LoCo benchmarks. The model generates dense vector embeddings for various tasks including search, clustering, and classification using specific instruction prefixes.
Links: Documentation | Paper
MusicFlamingo
Music Flamingo is a fully open large audio–language model designed for robust understanding and reasoning over music. It builds upon the Audio Flamingo 3 architecture by including Rotary Time Embeddings (RoTE), which injects temporal position information to enable the model to handle audio sequences up to 20 minutes. The model features a unified audio encoder across speech, sound, and music with special sound boundary tokens for improved audio sequence modeling.
Links: Documentation | Paper
Breaking changes
Mamba and hybrid model caches are now first-class native citizens in the library, so users working with Mamba-based or hybrid (Mamba + attention) models should update their code to use the new native cache classes instead of any previous workarounds.
- 🚨 [Cache] Native mamba & hybrid cache (#44950) by @Cyrilvallez
Remote code execution support has been removed from the native LightGlue integration, so users who were loading LightGlue with trust_remote_code=True must remove that argument and use the model directly through the standard native API.
Vision
Several vision-related bugs were fixed in this release, including correcting the Gemma vision mask to support video inputs, resolving a dependency issue that incorrectly required torchvision for PIL-based image processors, and patching bugs in the Janus image generation model and image loading. Local code resolution for tokenizers and image processors was also corrected.
- Generalize gemma vision mask to videos (#45185) by @zucchini-nlp in [#45185]
- Fix explicit local code resolution for tokenizers and image processors (#45169) by @hmellor in [#45169]
- fix bug for janus model image generation (#45044) by @kaixuanliu in [#45044]
- [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045) by @Lidang-Jiang in [#45045]
- Avoid
Image.openfailure (#44645) by @sywangyi in [#44645]
Cache
Improved the performance of repository checks (check-repo) by introducing file-level and AST-level disk caching, achieving up to a 27x speedup (from ~46s to ~1.6s with a warm cache), and fixed the mlinter cache location in .gitignore.
- refactoring: speedup static checks with disk cache (#44992) by @tarekziade in [#44992]
- refactor: added cache in check_repo (#45012) by @tarekziade in [#45012]
- chore: Fix mlinter cache location (#45052) by @tarekziade in [#45052]
Bugfixes and improvements
- Fix resized LM head weights being overwritten by post_init (#45079) by @javierdejesusda in [#45079]
- [Qwen3.5 MoE] Add _tp_plan to ForConditionalGeneration (#45124) by @danielquintas8 in [#45124]
- fix(models): Fix dtype mismatch in SwitchTransformers and TimmWrapperModel (#45074) by @harshaljanjani in [#45074]
- [misc] fix qwen35 tests: correct the text model type and skip reverse_mapping (#45173) by @JJJYmmm in [#45173]
- 🔒 Pin GitHub Actions to commit SHAs (#45180) by @paulinebm in [#45180]
- Use doc-builder runnable example for GLM-ASR (#44277) by @tarekziade in [#44277]
- CI] Small T5 expectations updated (#45138) by @Abdennacer-Badaoui in [#45138]
- fix: correct type annotations across config classes for @strict validation (#45007) by @Krishnachaitanyakc in [#45007]
- Fix T5Attention shape mismatch under Tensor Parallelism (#45109) by @aws-zhanxun in [#45109]
- [refactor] Serving into proper modules (#44796) by @SunMarc in [#44796]
- Re-add regex substitutions to the response parsing spec (#45166) by @Rocketknight1 in [#45166]
- Fix incorrect TrainingArguments example in training.md (#45150) by @maanas1234 in [#45150]
- Add parse_response to Processor, make it a bit more official (#45143) by @Rocketknight1 in [#45143]
- DeepGEMM (#44832) by @IlyasMoutawwakil in [#44832]
- fix: prefer registered config over remote code in AutoConfig.from_pretrained (#45094) by @HanFa in [#45094]
- [serving] Fix continuous batching JSON response serialization (#45057) by @NathanHB in [#45057]
- Fix stupid test fetcher (#45140) by @ydshieh in [#45140]
- [CB] Add warmup feature (#45112) by @remi-or in [#45112]
- feature: added import complexity checker (#45013) by @tarekziade in [#45013]
- Fix tests for
janusmodel (#44739) by @kaixuanliu in [#44739] - CB improvements for serving (#45063) by @SunMarc in [#45063]
- [docs] continuous batching (#44896) by @stevhliu in [#44896]
- Fix few issues in Qwen_3_Omni_Moe (#44848) by @Sai-Suraj-27 in [#44848]
- Fix TypeError in rope validation when ignore_keys is a list (#45069) by @Fr0do in [#45069]
- Remove unused TensorFlow env var (#45065) by @Sai-Suraj-27 in [#45065]
- fix: add identity reverse_op to dequantize ops for save_pretrained (#44983) by @Hyungkeun-Park-Nota in [#44983]
- Fix when RoPE params are in kwargs (#45049) by @zucchini-nlp in [#45049]
- chore: update update_metdata.yml (#45054) by @hf-security-analysis[bot] in [#45054]
- [
FA] Fix BC support for a few versions + add deprecation cycle (#45061) by @vasqu in [#45061] - fix(testing): Fix Parakeet, Evolla, Pi0, and Phi-3 test failures on main CI (#45004) by @harshaljanjani in [#45004]
- Allow advanced users to override
model_typeinAutoConfig.from_pretrained(#45058) by @hmellor in [#45058] - Fix failing
SmolLM3IntegrationTest(#45048) by @Sai-Suraj-27 in [#45048] - chore: remove old extras (#45024) by @tarekziade in [#45024]
- Embedding VLMs don't need a head (#45000) by @zucchini-nlp in [#45000]
- Fix GraniteConfig type hints to accept int for multiplier fields (#45019) by @javierdejesusda in [#45019]
- fix: preserve rotary_pct across save/load cycle in GPTNeoX configs (#44985) by @Krishnachaitanyakc in [#44985]
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @ed22699
- Internalise the NomicBERT model (#43067)
- @tarekziade
- Use doc-builder runnable example for GLM-ASR (#44277)
- refactoring: speedup static checks with disk cache (#44992)
- feature: added import complexity checker (#45013)
- refactor: added cache in check_repo (#45012)
- chore: remove old extras (#45024)
- chore: Fix mlinter cache location (#45052)
- refactor: speed up docstring checker (#45009)
- @Krishnachaitanyakc
- @lashahub
- Add Music Flamingo (#43538)
- @Lidang-Jiang
- [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors (#45045)