What's Changed
This release candidate was focused mostly on quantization support with the new dynamic weight loader, and a few notable 🚨 breaking changes🚨:
- Default dtype for any model when using
from_pretrainedis nowauto!
- Default auto 🚨 🚨 by @ArthurZucker in #42805
- Default shard size when saving a model is now 50GB:
- 🚨🚨 [saving] Default to 50GB shards, and remove non-safe serialization by @Cyrilvallez in #42734
This is now as fast as before thanks to xet, and is just more convenient on the hub.
- Kwargs. They are fundamental to enable integration with vllm and other toosl:
- Every model forward() should have **kwargs by @Rocketknight1 in #42603
Dynamic weight loader updates:
Mostly QOL and fixed + support back CPU offloading.
- mark params as _is_hf_initialized with DS Zero3 from weight conversion by @winglian in #42626
- [loading] Allow loading to happen without threading by @Cyrilvallez in #42619
- [loading] Correctly load params during offloading & careful memory considerations by @Cyrilvallez in #42632
- allow registration of custom checkpoint conversion mappings by @winglian in #42634
New models:
- Add FastVLM by @camilla-deckard in #41112
- Lasr model by @eustlb in #42648
- [Model] Add PaddleOCR-VL Model Support by @zhang-prog in #42178
Some notable quantization fixes:
Mostly added support for fbgemme , quanto,
- Fix fp8 + some enhancement by @SunMarc in #42455
- Fix eetq quanto quant methods by @SunMarc in #42557
- [Quantization] per tensor quantization kernel by @MekkCyber in #42560
- [Quantization] fix fbgemm by @MekkCyber in #42561
- [Quantization] Fix FP8 experts replacing by @MekkCyber in #42654
- [Quantization] Fix Static FP8 Quantization by @MekkCyber in #42775
- [core] fix fp-quant by @MekkCyber in #42613
Peft:
The dynamic weight loader broke small things, this adds glue for all models but MoEs.
- FIX Error when trying to load non-LoRA PEFT by @BenjaminBossan in #42663
- Fix PEFT integration with new weight loader by @Cyrilvallez in #42701
Misc
Tokenization needed more refactoring, this time its a lot cleaner!
- Refactor-tokenization-more by @ArthurZucker in #42563
- Only default
rope_parametersto emptydictif there is something to put in it by @hmellor in #42651
We omitted a lot of other commits for clarity, but thanks to everyone and the new contributors!
New Contributors
- @camilla-deckard made their first contribution in #41112
- @Aaraviitkgp made their first contribution in #42466
- @ngazagna-qc made their first contribution in #40691
- @arrdel made their first contribution in #42577
- @marconaguib made their first contribution in #42587
- @Xiao-Chenguang made their first contribution in #42436
- @Furkan-rgb made their first contribution in #42465
- @mertunsall made their first contribution in #42615
- @anranlee99 made their first contribution in #42438
- @UserChen666 made their first contribution in #42335
- @efazal made their first contribution in #41723
- @Harrisonyong made their first contribution in #36416
- @hawon223 made their first contribution in #42384
- @Bissmella made their first contribution in #42647
- @AgainstEntropy made their first contribution in #42689
- @dongluw made their first contribution in #42642
- @hqkqn32 made their first contribution in #42620
- @zhang-prog made their first contribution in #42178
Full Changelog: v5.0.0rc0...v5.0.0rc1