BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Asa Cooper Stickland; Iain Murray

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Asa Cooper Stickland, Iain Murray

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5986-5995, 2019.

Abstract

Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a \hbox{single} BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately fine-tuned models on the GLUE benchmark with $\approx$7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-stickland19a,
  title = 	 {{BERT} and {PAL}s: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning},
  author =       {Stickland, Asa Cooper and Murray, Iain},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {5986--5995},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/stickland19a/stickland19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/stickland19a.html},
  abstract = 	 {Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a \hbox{single} BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately fine-tuned models on the GLUE benchmark with $\approx$7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.}
}

Endnote

%0 Conference Paper
%T BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning
%A Asa Cooper Stickland
%A Iain Murray
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-stickland19a
%I PMLR
%P 5986--5995
%U https://proceedings.mlr.press/v97/stickland19a.html
%V 97
%X Multi-task learning shares information between related tasks, sometimes reducing the number of parameters required. State-of-the-art results across multiple natural language understanding tasks in the GLUE benchmark have previously used transfer from a single large task: unsupervised pre-training with BERT, where a separate BERT model was fine-tuned for each task. We explore multi-task approaches that share a \hbox{single} BERT model with a small number of additional task-specific parameters. Using new adaptation modules, PALs or ‘projected attention layers’, we match the performance of separately fine-tuned models on the GLUE benchmark with $\approx$7 times fewer parameters, and obtain state-of-the-art results on the Recognizing Textual Entailment dataset.

APA

Stickland, A.C. & Murray, I.. (2019). BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5986-5995 Available from https://proceedings.mlr.press/v97/stickland19a.html.

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Abstract

Cite this Paper

Related Material