Highlights

Poly PEFT method

Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists of pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (𝙿𝚘𝚕𝚢) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. To put simply, you can think of it as Mixture of Expert Adapters.
𝙼𝙷𝚁 (Multi-Head Routing) combines subsets of adapter parameters and outperforms 𝙿𝚘𝚕𝚢 under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (𝙼𝙷𝚁-z) they achieve competitive performance with extreme parameter efficiency.

Add Poly by @TaoSunVoyage in #1129

LoRA improvements

Now, you can specify all-linear to target_modules param of LoraConfig to target all the linear layers which has shown to perform better in QLoRA paper than only targeting query and valuer attention layers

Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295

Embedding layers of base models are now automatically saved when the embedding layers are resized when fine-tuning with PEFT approaches like LoRA. This enables extending the vocabulary of tokenizer to include special tokens. This is a common use-case when doing the following:

Instruction finetuning with new tokens being added such as <|user|>, <|assistant|>, <|system|>, <|im_end|>, <|im_start|>, </s>, <s> to properly format the conversations
Finetuning on a specific language wherein language specific tokens are added, e.g., Korean tokens being added to vocabulary for finetuning LLM on Korean datasets.
Instruction finetuning to return outputs in a certain format to enable agent behaviour of new tokens such as <|FUNCTIONS|>, <|BROWSE|>, <|TEXT2IMAGE|>, <|ASR|>, <|TTS|>, <|GENERATECODE|>, <|RAG|>.
A good blogpost to learn more about this https://www.philschmid.de/fine-tune-llms-in-2024-with-trl.

save the embeddings even when they aren't targetted but resized by @pacman100 in #1383

New option use_rslora in LoraConfig. Use it for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).

Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244

Documentation improvements

Refactoring and updating of the concept guides. [docs] Concept guides by @stevhliu in #1269
Improving task guides to focus more on how to use different PEFT methods and related nuances instead of focusing more on different type of tasks. It condenses the individual guides into a single one to highlight the commonalities and differences, and to refer to existing docs to avoid duplication. [docs] Task guides by @stevhliu in #1332
DOC: Update docstring for the config classes by @BenjaminBossan in #1343
LoftQ: edit README.md and example files by @yxli2123 in #1276
[Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
[docs] Docstring link by @stevhliu in #1356
QOL improvements and doc updates by @pacman100 in #1318
Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
DOC: Improve target modules description by @BenjaminBossan in #1290
DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
Improve documentation for the all-linear flag by @SumanthRH in #1357
Fix various typos in LoftQ docs. by @arnavgarg1 in #1408

What's Changed

Bump version to 0.7.2.dev0 post release by @BenjaminBossan in #1258
FIX Error in log_reports.py by @BenjaminBossan in #1261
Fix ModulesToSaveWrapper getattr by @zhangsheng377 in #1238
TST: Revert device_map for AdaLora 4bit GPU test by @BenjaminBossan in #1266
remove a duplicated description in peft BaseTuner by @butyuhao in #1271
Added the option to use the corrected scaling factor for LoRA, based on new research. by @Damjan-Kalajdzievski in #1244
feat: add apple silicon GPU acceleration by @NripeshN in #1217
LoftQ: Allow quantizing models loaded on the CPU for LoftQ initialization by @hiyouga in #1256
LoftQ: edit README.md and example files by @yxli2123 in #1276
TST: Extend LoftQ tests to check CPU initialization by @BenjaminBossan in #1274
Refactor and a couple of fixes for adapter layer updates by @BenjaminBossan in #1268
[Tests] Add bitsandbytes installed from source on new docker images by @younesbelkada in #1275
TST: Enable LoftQ 8bit tests by @BenjaminBossan in #1279
[bnb] Add bnb nightly workflow by @younesbelkada in #1282
Fixed several errors in StableDiffusion adapter conversion script by @kovalexal in #1281
[docs] Concept guides by @stevhliu in #1269
DOC: Improve target modules description by @BenjaminBossan in #1290
[bnb-nightly] Address final comments by @younesbelkada in #1287
[BNB] Fix bnb dockerfile for latest version by @SunMarc in #1291
fix fsdp auto wrap policy by @pacman100 in #1302
[BNB] fix dockerfile for single gpu by @SunMarc in #1305
Fix bnb lora layers not setting active adapter by @tdrussell in #1294
Mistral IA3 config defaults by @pacman100 in #1316
fix the embedding saving for adaption prompt by @pacman100 in #1314
fix diffusers tests by @pacman100 in #1317
FIX Use torch.long instead of torch.int in LoftQ for PyTorch versions <2.x by @BenjaminBossan in #1320
Extend merge_and_unload to offloaded models by @blbadger in #1190
Add an option 'ALL' to include all linear layers as target modules by @SumanthRH in #1295
Refactor dispatching logic of LoRA layers by @BenjaminBossan in #1319
Fix bug when load the prompt tuning in inference. by @yileld in #1333
DOC Troubleshooting for unscaling error with fp16 by @BenjaminBossan in #1336
ENH: Add attribute to show targeted module names by @BenjaminBossan in #1330
fix some args desc by @zspo in #1338
Fix logic in target module finding by @s-k-yx in #1263
Doc about AdaLoraModel.update_and_allocate by @kuronekosaiko in #1341
DOC: Update docstring for the config classes by @BenjaminBossan in #1343
fix prepare_inputs_for_generation logic for Prompt Learning methods by @pacman100 in #1352
QOL improvements and doc updates by @pacman100 in #1318
New transformers caching ETA now v4.38 by @BenjaminBossan in #1348
FIX Setting active adapter for quantized layers by @BenjaminBossan in #1347
DOC Extending the vocab and storing embeddings by @BenjaminBossan in #1335
[Docs] make add_weighted_adapter example clear in the docs. by @sayakpaul in #1353
DOC Add PeftMixedModel to API docs by @BenjaminBossan in #1354
Add Poly by @TaoSunVoyage in #1129
[docs] Docstring link by @stevhliu in #1356
Added missing getattr dunder methods for mixed model by @kovalexal in #1365
Handle resizing of embedding layers for AutoPeftModel by @pacman100 in #1367
account for the new merged/unmerged weight to perform the quantization again by @pacman100 in #1370
add mixtral in LoRA mapping by @younesbelkada in #1380
save the embeddings even when they aren't targetted but resized by @pacman100 in #1383
Improve documentation for the all-linear flag by @SumanthRH in #1357
Fix LoRA module mapping for Phi models by @arnavgarg1 in #1375
[docs] Task guides by @stevhliu in #1332
Add generic PeftConfig constructor from kwargs by @sfriedowitz in #1398
Fix various typos in LoftQ docs. by @arnavgarg1 in #1408
Release: v0.8.0 by @pacman100 in #1406

New Contributors

@butyuhao made their first contribution in #1271
@Damjan-Kalajdzievski made their first contribution in #1244
@NripeshN made their first contribution in #1217
@hiyouga made their first contribution in #1256
@tdrussell made their first contribution in #1294
@blbadger made their first contribution in #1190
@yileld made their first contribution in #1333
@s-k-yx made their first contribution in #1263
@kuronekosaiko made their first contribution in #1341
@TaoSunVoyage made their first contribution in #1129
@arnavgarg1 made their first contribution in #1375
@sfriedowitz made their first contribution in #1398

Full Changelog: v0.7.1...v0.8.0

peft 0.8.0 v0.8.0: Poly PEFT method, LoRA improvements, Documentation improvements and more on Python PyPI