Towards Modular LLMs by Building and Reusing a Library of LoRAs

Oleksiy Ostapenko, Zhan Su, Edoardo Ponti, Laurent Charlin, Nicolas Le Roux, Lucas Caccia, Alessandro Sordoni
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:38885-38904, 2024.

Abstract

Given the increasing number of parameter-efficient adapters of large language models (LLMs), how can we reuse them to improve LLM performance on new tasks? We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, $\texttt{MBC}$, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. In order to reuse the library, we present a novel zero-shot routing mechanism, $\texttt{Arrow}$, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. Thus, we make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ostapenko24a, title = {Towards Modular {LLM}s by Building and Reusing a Library of {L}o{RA}s}, author = {Ostapenko, Oleksiy and Su, Zhan and Ponti, Edoardo and Charlin, Laurent and Le Roux, Nicolas and Caccia, Lucas and Sordoni, Alessandro}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {38885--38904}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ostapenko24a/ostapenko24a.pdf}, url = {https://proceedings.mlr.press/v235/ostapenko24a.html}, abstract = {Given the increasing number of parameter-efficient adapters of large language models (LLMs), how can we reuse them to improve LLM performance on new tasks? We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, $\texttt{MBC}$, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. In order to reuse the library, we present a novel zero-shot routing mechanism, $\texttt{Arrow}$, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. Thus, we make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.} }
Endnote
%0 Conference Paper %T Towards Modular LLMs by Building and Reusing a Library of LoRAs %A Oleksiy Ostapenko %A Zhan Su %A Edoardo Ponti %A Laurent Charlin %A Nicolas Le Roux %A Lucas Caccia %A Alessandro Sordoni %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ostapenko24a %I PMLR %P 38885--38904 %U https://proceedings.mlr.press/v235/ostapenko24a.html %V 235 %X Given the increasing number of parameter-efficient adapters of large language models (LLMs), how can we reuse them to improve LLM performance on new tasks? We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, $\texttt{MBC}$, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. In order to reuse the library, we present a novel zero-shot routing mechanism, $\texttt{Arrow}$, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. Thus, we make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.
APA
Ostapenko, O., Su, Z., Ponti, E., Charlin, L., Le Roux, N., Caccia, L. & Sordoni, A.. (2024). Towards Modular LLMs by Building and Reusing a Library of LoRAs. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:38885-38904 Available from https://proceedings.mlr.press/v235/ostapenko24a.html.

Related Material