What's Changed
This release candidate is focused on fixing AutoTokenizer, expanding the dynamic weight loading support, and improving performances with MoEs!
MoEs and performances:
- batched and grouped experts implementations by @IlyasMoutawwakil in #42697
- Optimize MoEs for decoding using batched_mm by @IlyasMoutawwakil in #43126
Tokenization:
The main issue with the tokenization refactor is that tokenizer_class are now "enforced" when in most cases they are wrong. This took a while to properly isolate and now we try to use TokenizersBackend whenever we can. #42894 has a much more detailed description of the big changes!
- use
TokenizersBackendby @ArthurZucker in #42894 - Fix convert_tekken_tokenizer by @juliendenize in #42592
- refactor more tokenizers - v5 guide update by @itazap in #42768
- [
Tokenizers] Change treatment of special tokens by @vasqu in #42903
Core
Here we focused on boosting the performances of loading weights on device!
- [saving] Simplify general logic by @Cyrilvallez in #42766
- Do not rely on config for inferring model dtype by @Cyrilvallez in #42838
- Improve BatchFeature: stack list and lists of torch tensors by @yonigozlan in #42750
- Remove tied weights from internal attribute if they are not tied by @Cyrilvallez in #42871
- Enforce call to
post_initand fix all of them by @Cyrilvallez in #42873 - Simplify tie weights logic by @Cyrilvallez in #42895
- Add buffers to
_init_weightsfor ALL models by @Cyrilvallez in #42309 - [loading] Really initialize on meta device for huge perf gains by @Cyrilvallez in #42941
- Do not use accelerate hooks if the device_map has only 1 device by @Cyrilvallez in #43019
- Move missing weights and non-persistent buffers to correct device earlier by @Cyrilvallez in #43021
New models
- Sam: Perception Encoder Audiovisual by @eustlb in #42905
- adds jais2 model support by @sarathc-cerebras in #42684
- Add Pixio pre-trained models by @LiheYoung in #42795
- [
Ernie 4.5] Ernie VL models by @vasqu in #39585 - [loading][TP] Fix device placement at loading-time, and simplify sharding primitives by @Cyrilvallez in #43003
- GLM-ASR Support by @zRzRzRzRzRzRzR in #42875
Quantization
- [Devstral] Make sure FP8 conversion works correctly by @patrickvonplaten in #42715
- Fp8 dq by @SunMarc in #42926
- [Quantization] Removing misleading int8 quantization in Finegrained FP8 by @MekkCyber in #42945
- Fix deepspeed + quantization by @SunMarc in #43006
Breaking changes
Mostly around processors!
- 🚨 Fix ConvNeXt image processor default interpolation to BICUBIC by @lukepayyapilli in #42934
- 🚨 Fix EfficientNet image processor default interpolation to BICUBIC by @lukepayyapilli in #42956
- Add fast version of
convert_segmentation_map_to_binary_masksto EoMT by @simonreise in #43073 - 🚨Fix MobileViT image processor default interpolation to BICUBIC by @lukepayyapilli in #43024
Thanks again to everyone !
New Contributors
- @ZX-ModelCloud made their first contribution in #42833
- @AYou0207 made their first contribution in #42863
- @wasertech made their first contribution in #42864
- @preetam1407 made their first contribution in #42685
- @Taise228 made their first contribution in #41416
- @CandiedCode made their first contribution in #42885
- @sarathc-cerebras made their first contribution in #42684
- @nandan2003 made their first contribution in #42318
- @LiheYoung made their first contribution in #42795
- @majiayu000 made their first contribution in #42928
- @lukepayyapilli made their first contribution in #42934
- @leaderofARS made their first contribution in #42966
- @qianyue76 made their first contribution in #43095
- @stefgina made their first contribution in #43033
- @HuiyingLi made their first contribution in #43084
- @raimbekovm made their first contribution in #43038
- @PredictiveManish made their first contribution in #43053
- @pushkar-hue made their first contribution in #42736
- @vykhovanets made their first contribution in #43042
- @tanmay2004 made their first contribution in #42737
- @atultw made their first contribution in #43061
Full Changelog: v5.0.0rc1...v5.0.0rc2