Highlights

New methods

BOFT

Thanks to @yfeng95, @Zeju1997, and @YuliangXiu, PEFT was extended with BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (#1326, BOFT paper link). In PEFT v0.7.0, we already added OFT, but BOFT is even more parameter efficient. Check out the included BOFT controlnet and BOFT dreambooth examples.

VeRA

If the parameter reduction of LoRA is not enough for your use case, you should take a close look at VeRA: Vector-based Random Matrix Adaptation (#1564, VeRA paper link). This method resembles LoRA but adds two learnable scaling vectors to the two LoRA weight matrices. However, the LoRA weights themselves are shared across all layers, considerably reducing the number of trainable parameters.

The bulk of this PR was implemented by contributor @vvvm23 with the help of @dkopi.

PiSSA

PiSSA, Principal Singular values and Singular vectors Adaptation, is a new initialization method for LoRA, which was added by @fxmeng (#1626, PiSSA paper link). The improved initialization promises to speed up convergence and improve the final performance of LoRA models. When using models quantized with bitsandbytes, PiSSA initialization should reduce the quantization error, similar to LoftQ.

Quantization

HQQ

Thanks to @fahadh4ilyas, PEFT LoRA linear layers now support Half-Quadratic Quantization, HQQ (#1618, HQQ repo). HQQ is fast and efficient (down to 2 bits), while not requiring calibration data.

EETQ

Another new quantization method supported in PEFT is Easy & Efficient Quantization for Transformers, EETQ (#1675, EETQ repo). This 8 bit quantization method works for LoRA linear layers and should be faster than bitsandbytes.

Show adapter layer and model status

We added a feature to show adapter layer and model status of PEFT models in #1663. With the newly added methods, you can easily check what adapters exist on your model, whether gradients are active, whether they are enabled, which ones are active or merged. You will also be informed if irregularities have been detected.

To use this new feature, call model.get_layer_status() for layer-level information, and model.get_model_status() for model-level information. For more details, check out our docs on layer and model status.

Changes

Edge case of how we deal with `modules_to_save`

We had the issue that when we were using classes such as PeftModelForSequenceClassification, we implicitly added the classifier layers to model.modules_to_save. However, this would only add a new ModulesToSaveWrapper instance for the first adapter being initialized. When initializing a 2nd adapter via model.add_adapter, this information was ignored. Now, peft_config.modules_to_save is updated explicitly to add the classifier layers (#1615). This is a departure from how this worked previously, but it reflects the intended behavior better.

Furthermore, when merging together multiple LoRA adapters using model.add_weighted_adapter, if these adapters had modules_to_save, the original parameters of these modules would be used. This is unexpected and will most likely result in bad outputs. As there is no clear way to merge these modules, we decided to raise an error in this case (#1615).

What's Changed

Bump version to 0.10.1.dev0 by @BenjaminBossan in #1578
FIX Minor issues in docs, re-raising exception by @BenjaminBossan in #1581
FIX / Docs: Fix doc link for layer replication by @younesbelkada in #1582
DOC: Short section on using transformers pipeline by @BenjaminBossan in #1587
Extend PeftModel.from_pretrained() to models with disk-offloaded modules by @blbadger in #1431
[feat] Add lru_cache to import_utils calls that did not previously have it by @tisles in #1584
fix deepspeed zero3+prompt tuning bug. word_embeddings.weight shape i… by @sywangyi in #1591
MNT: Update GH bug report template by @BenjaminBossan in #1600
fix the torch_dtype and quant_storage_dtype by @pacman100 in #1614
FIX In the image classification example, Change the model to the LoRA… by @changhwa in #1624
Remove duplicated import by @nzw0301 in #1622
FIX: bnb config wrong argument names by @BenjaminBossan in #1603
FIX Make DoRA work with Conv1D layers by @BenjaminBossan in #1588
FIX: Send results to correct channel by @younesbelkada in #1628
FEAT: Allow ignoring mismatched sizes when loading by @BenjaminBossan in #1620
itemsize is torch>=2.1, use element_size() by @winglian in #1630
FIX Multiple adapters and modules_to_save by @BenjaminBossan in #1615
FIX Correctly call element_size by @BenjaminBossan in #1635
fix: allow load_adapter to use different device by @yhZhai in #1631
Adalora deepspeed by @sywangyi in #1625
Adding BOFT: Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization by @yfeng95 in #1326
Don't use deprecated Repository anymore by @Wauplin in #1641
FIX Errors in the transformers integration docs by @BenjaminBossan in #1629
update figure assets of BOFT by @YuliangXiu in #1642
print_trainable_parameters - format % to be sensible by @stas00 in #1648
FIX: Bug with handling of active adapters by @BenjaminBossan in #1659
Remove dreambooth Git link by @charliermarsh in #1660
add safetensor load in multitask_prompt_tuning by @sywangyi in #1662
Adds Vera (Vector Based Random Matrix Adaption) #2 by @BenjaminBossan in #1564
Update deepspeed.md by @sanghyuk-choi in #1679
ENH: Add multi-backend tests for bnb by @younesbelkada in #1667
FIX / Workflow: Fix Mac-OS CI issues by @younesbelkada in #1680
FIX Use trl version of tiny random llama by @BenjaminBossan in #1681
FIX: Don't eagerly import bnb for LoftQ by @BenjaminBossan in #1683
FEAT: Add EETQ support in PEFT by @younesbelkada in #1675
FIX / Workflow: Always notify on slack for docker image workflows by @younesbelkada in #1682
FIX: upgrade autoawq to latest version by @younesbelkada in #1684
FIX: Initialize DoRA weights in float32 if float16 is being used by @BenjaminBossan in #1653
fix bf16 model type issue for ia3 by @sywangyi in #1634
FIX Issues with AdaLora initialization by @BenjaminBossan in #1652
FEAT Show adapter layer and model status by @BenjaminBossan in #1663
Fixing the example by providing correct tokenized seq length by @jpodivin in #1686
TST: Skiping AWQ tests for now .. by @younesbelkada in #1690
Add LayerNorm tuning model by @DTennant in #1301
FIX Use different doc builder docker image by @BenjaminBossan in #1697
Set experimental dynamo config for compile tests by @BenjaminBossan in #1698
fix the fsdp peft autowrap policy by @pacman100 in #1694
Add LoRA support to HQQ Quantization by @fahadh4ilyas in #1618
FEAT Helper to check if a model is a PEFT model by @BenjaminBossan in #1713
support Cambricon MLUs device by @huismiling in #1687
Some small cleanups in docstrings, copyright note by @BenjaminBossan in #1714
Fix docs typo by @NielsRogge in #1719
revise run_peft_multigpu.sh by @abzb1 in #1722
Workflow: Add slack messages workflow by @younesbelkada in #1723
DOC Document the PEFT checkpoint format by @BenjaminBossan in #1717
FIX Allow DoRA init on CPU when using BNB by @BenjaminBossan in #1724
Adding PiSSA as an optional initialization method of LoRA by @fxmeng in #1626

New Contributors

@tisles made their first contribution in #1584
@changhwa made their first contribution in #1624
@yhZhai made their first contribution in #1631
@yfeng95 made their first contribution in #1326
@YuliangXiu made their first contribution in #1642
@charliermarsh made their first contribution in #1660
@sanghyuk-choi made their first contribution in #1679
@jpodivin made their first contribution in #1686
@DTennant made their first contribution in #1301
@fahadh4ilyas made their first contribution in #1618
@huismiling made their first contribution in #1687
@NielsRogge made their first contribution in #1719
@abzb1 made their first contribution in #1722
@fxmeng made their first contribution in #1626

Full Changelog: v0.10.0...v0.11.0

peft 0.11.0 v0.11.0: New PEFT methods BOFT, VeRA, PiSSA, quantization with HQQ and EETQ, and more on Python PyPI