GPTQ Integration

Now, you can finetune GPTQ quantized models using PEFT. Here are some examples of how to use PEFT with a GPTQ model: colab notebook and finetuning script.

GPTQ Integration by @SunMarc in #771

Low-level API

Enables users and developers to use PEFT as a utility library, at least for injectable adapters (LoRA, IA3, AdaLoRA). It exposes an API to modify the model in place to inject the new layers into the model.

[core] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada #749
[Low-level-API] Add docs about LLAPI by @younesbelkada in #836

Support for XPU and NPU devices

Leverage the support for more devices for loading and fine-tuning PEFT adapters.

Support XPU adapter loading by @abhilash1910 in #737
Support Ascend NPU adapter loading by @statelesshz in #772

Mix-and-match LoRAs

Stable support and new ways of merging multiple LoRAs. There are currently 3 ways of merging loras supported: linear, svd and cat.

Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in #817

What's Changed

Release version 0.5.0.dev0 by @pacman100 in #717
Fix subfolder issue by @younesbelkada in #721
Add falcon to officially supported LoRA & IA3 modules by @younesbelkada in #722
revert change by @pacman100 in #731
fix(pep561): include packaging type information by @aarnphm in #729
[Llama2] Add disabling TP behavior by @younesbelkada in #728
[Patch] patch trainable params for 4bit layers by @younesbelkada in #733
FIX: Warning when initializing prompt encoder by @BenjaminBossan in #716
ENH: Warn when disabling adapters and bias != 'none' by @BenjaminBossan in #741
FIX: Disabling adapter works with modules_to_save by @BenjaminBossan in #736
Updated Example in Class:LoraModel by @TianyiPeng in #672
[AdaLora] Fix adalora inference issue by @younesbelkada in #745
Add btlm to officially supported LoRA by @Trapper4888 in #751
[ModulesToSave] add correct hook management for modules to save by @younesbelkada in #755
Example notebooks for LoRA with custom models by @BenjaminBossan in #724
Add tests for AdaLoRA, fix a few bugs by @BenjaminBossan in #734
Add progressbar unload/merge by @BramVanroy in #753
Support XPU adapter loading by @abhilash1910 in #737
Support Ascend NPU adapter loading by @statelesshz in #772
Allow passing inputs_embeds instead of input_ids by @BenjaminBossan in #757
[core] PEFT refactor + introducing inject_adapter_in_model public method by @younesbelkada in #749
Add adapter error handling by @BenjaminBossan in #800
add lora default target module for codegen by @sywangyi in #787
DOC: Update docstring of PeftModel.from_pretrained by @BenjaminBossan in #799
fix crash when using torch.nn.DataParallel for LORA inference by @sywangyi in #805
Peft model signature by @kiansierra in #784
GPTQ Integration by @SunMarc in #771
Only fail quantized Lora unload when actually merging by @BlackHC in #822
Added additional parameters to mixing multiple LoRAs through SVD, added ability to mix LoRAs through concatenation by @kovalexal in #817
TST: add test about loading custom models by @BenjaminBossan in #827
Fix unbound error in ia3.py by @His-Wardship in #794
[Docker] Fix gptq dockerfile by @younesbelkada in #835
[Tests] Add 4bit slow training tests by @younesbelkada in #834
[Low-level-API] Add docs about LLAPI by @younesbelkada in #836
Type annotation fix by @vwxyzjn in #840

New Contributors

@TianyiPeng made their first contribution in #672
@Trapper4888 made their first contribution in #751
@abhilash1910 made their first contribution in #737
@statelesshz made their first contribution in #772
@kiansierra made their first contribution in #784
@BlackHC made their first contribution in #822
@His-Wardship made their first contribution in #794
@vwxyzjn made their first contribution in #840

Full Changelog: v0.4.0...v0.5.0

peft 0.5.0 GPTQ Quantization, Low-level API on Python PyPI