Meta-Adapters: Parameter Efficient Few-shot Fine-tuning through Meta-Learning

Trapit Bansal, Salaheddin Alzubi, Tong Wang, Jay-Yoon Lee, Andrew McCallum
Proceedings of the First International Conference on Automated Machine Learning, PMLR 188:19/1-18, 2022.

Abstract

Consistent improvements in the representational capacity of large pre-trained transformers has made it increasingly viable to serve these models as shared priors that can be fine-tuned on a large number of downstream tasks. However, fine-tuning the entire model for every task of interest makes a copy of all the model parameters, rendering such scenarios highly impractical. Recently introduced Adapter methods propose a promising alternative, one where only a small number of additional parameters are introduced per task specifically for fine-tuning. However, Adapters often require large amounts of task-specific data for good performance and don’t work well in data-scarce few-shot scenarios. In this paper, we approach parameter-efficient fine-tuning in few-shot settings from a meta-learning perspective. We introduce Meta-Adapters, which are small blocks of meta-learned adapter layers inserted in a pre-trained model that re-purpose a frozen pre-trained model into a parameter-efficient few-shot learner. Meta-Adapters perform competitively with state-of-the-art few-shot learning methods that require full fine-tuning, while only fine-tuning 0.6% of the parameters. We evaluate Meta-Adapters along with multiple transfer learning baselines on an evaluation suite of 17 classification tasks and find that they improve few-shot accuracy by a large margin over competitive parameter-efficient methods, while requiring significantly lesser parameters for fine-tuning. Moreover, when comparing few-shot prompting of GPT-3 against few-shot fine-tuning with Meta-Adapters, we find that Meta-Adapters perform competitively while working with pre-trained transformers that are many orders of magnitude (1590{\texttimes}) smaller in size than GPT-3.

Cite this Paper


BibTeX
@InProceedings{pmlr-v188-bansal22a, title = {Meta-Adapters: Parameter Efficient Few-shot Fine-tuning through Meta-Learning}, author = {Bansal, Trapit and Alzubi, Salaheddin and Wang, Tong and Lee, Jay-Yoon and McCallum, Andrew}, booktitle = {Proceedings of the First International Conference on Automated Machine Learning}, pages = {19/1--18}, year = {2022}, editor = {Guyon, Isabelle and Lindauer, Marius and van der Schaar, Mihaela and Hutter, Frank and Garnett, Roman}, volume = {188}, series = {Proceedings of Machine Learning Research}, month = {25--27 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v188/bansal22a/bansal22a.pdf}, url = {https://proceedings.mlr.press/v188/bansal22a.html}, abstract = {Consistent improvements in the representational capacity of large pre-trained transformers has made it increasingly viable to serve these models as shared priors that can be fine-tuned on a large number of downstream tasks. However, fine-tuning the entire model for every task of interest makes a copy of all the model parameters, rendering such scenarios highly impractical. Recently introduced Adapter methods propose a promising alternative, one where only a small number of additional parameters are introduced per task specifically for fine-tuning. However, Adapters often require large amounts of task-specific data for good performance and don’t work well in data-scarce few-shot scenarios. In this paper, we approach parameter-efficient fine-tuning in few-shot settings from a meta-learning perspective. We introduce Meta-Adapters, which are small blocks of meta-learned adapter layers inserted in a pre-trained model that re-purpose a frozen pre-trained model into a parameter-efficient few-shot learner. Meta-Adapters perform competitively with state-of-the-art few-shot learning methods that require full fine-tuning, while only fine-tuning 0.6% of the parameters. We evaluate Meta-Adapters along with multiple transfer learning baselines on an evaluation suite of 17 classification tasks and find that they improve few-shot accuracy by a large margin over competitive parameter-efficient methods, while requiring significantly lesser parameters for fine-tuning. Moreover, when comparing few-shot prompting of GPT-3 against few-shot fine-tuning with Meta-Adapters, we find that Meta-Adapters perform competitively while working with pre-trained transformers that are many orders of magnitude (1590{\texttimes}) smaller in size than GPT-3.} }
Endnote
%0 Conference Paper %T Meta-Adapters: Parameter Efficient Few-shot Fine-tuning through Meta-Learning %A Trapit Bansal %A Salaheddin Alzubi %A Tong Wang %A Jay-Yoon Lee %A Andrew McCallum %B Proceedings of the First International Conference on Automated Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Isabelle Guyon %E Marius Lindauer %E Mihaela van der Schaar %E Frank Hutter %E Roman Garnett %F pmlr-v188-bansal22a %I PMLR %P 19/1--18 %U https://proceedings.mlr.press/v188/bansal22a.html %V 188 %X Consistent improvements in the representational capacity of large pre-trained transformers has made it increasingly viable to serve these models as shared priors that can be fine-tuned on a large number of downstream tasks. However, fine-tuning the entire model for every task of interest makes a copy of all the model parameters, rendering such scenarios highly impractical. Recently introduced Adapter methods propose a promising alternative, one where only a small number of additional parameters are introduced per task specifically for fine-tuning. However, Adapters often require large amounts of task-specific data for good performance and don’t work well in data-scarce few-shot scenarios. In this paper, we approach parameter-efficient fine-tuning in few-shot settings from a meta-learning perspective. We introduce Meta-Adapters, which are small blocks of meta-learned adapter layers inserted in a pre-trained model that re-purpose a frozen pre-trained model into a parameter-efficient few-shot learner. Meta-Adapters perform competitively with state-of-the-art few-shot learning methods that require full fine-tuning, while only fine-tuning 0.6% of the parameters. We evaluate Meta-Adapters along with multiple transfer learning baselines on an evaluation suite of 17 classification tasks and find that they improve few-shot accuracy by a large margin over competitive parameter-efficient methods, while requiring significantly lesser parameters for fine-tuning. Moreover, when comparing few-shot prompting of GPT-3 against few-shot fine-tuning with Meta-Adapters, we find that Meta-Adapters perform competitively while working with pre-trained transformers that are many orders of magnitude (1590{\texttimes}) smaller in size than GPT-3.
APA
Bansal, T., Alzubi, S., Wang, T., Lee, J. & McCallum, A.. (2022). Meta-Adapters: Parameter Efficient Few-shot Fine-tuning through Meta-Learning. Proceedings of the First International Conference on Automated Machine Learning, in Proceedings of Machine Learning Research 188:19/1-18 Available from https://proceedings.mlr.press/v188/bansal22a.html.

Related Material