Understanding Inverse Scaling and Emergence in Multitask Representation Learning

Muhammed E Ildiz; Zhe Zhao; Samet Oymak

Understanding Inverse Scaling and Emergence in Multitask Representation Learning

Muhammed E Ildiz, Zhe Zhao, Samet Oymak

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:4726-4734, 2024.

Abstract

Large language models exhibit strong multitasking capabilities, however, their learning dynamics as a function of task characteristics, sample size, and model complexity remain mysterious. For instance, it is known that, as the model size grows, large language models exhibit emerging abilities where certain tasks can abruptly jump from poor to respectable performance. Such phenomena motivate a deeper understanding of how individual tasks evolve during multitasking. To this aim, we study a multitask representation learning setup where tasks can have distinct distributions, quantified by their covariance priors. Through random matrix theory, we precisely characterize the optimal linear representation for few-shot learning that minimizes the average test risk in terms of task covariances. When tasks have equal sample sizes, we prove a reduction to an equivalent problem with a single effective covariance from which the individual task risks of the original problem can be deduced. Importantly, we introduce “task competition” to explain how tasks with dominant covariance eigenspectrum emerge faster than others. We show that task competition can potentially explain the inverse scaling of certain tasks i.e. reduced test accuracy as the model grows. Overall, this work sheds light on the risk and emergence of individual tasks and uncovers new high-dimensional phenomena (including multiple-descent risk curves) that arise in multitask representation learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v238-e-ildiz24a,
  title = 	 { Understanding Inverse Scaling and Emergence in Multitask Representation Learning },
  author =       {E Ildiz, Muhammed and Zhao, Zhe and Oymak, Samet},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4726--4734},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/e-ildiz24a/e-ildiz24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/e-ildiz24a.html},
  abstract = 	 { Large language models exhibit strong multitasking capabilities, however, their learning dynamics as a function of task characteristics, sample size, and model complexity remain mysterious. For instance, it is known that, as the model size grows, large language models exhibit emerging abilities where certain tasks can abruptly jump from poor to respectable performance. Such phenomena motivate a deeper understanding of how individual tasks evolve during multitasking. To this aim, we study a multitask representation learning setup where tasks can have distinct distributions, quantified by their covariance priors. Through random matrix theory, we precisely characterize the optimal linear representation for few-shot learning that minimizes the average test risk in terms of task covariances. When tasks have equal sample sizes, we prove a reduction to an equivalent problem with a single effective covariance from which the individual task risks of the original problem can be deduced. Importantly, we introduce “task competition” to explain how tasks with dominant covariance eigenspectrum emerge faster than others. We show that task competition can potentially explain the inverse scaling of certain tasks i.e. reduced test accuracy as the model grows. Overall, this work sheds light on the risk and emergence of individual tasks and uncovers new high-dimensional phenomena (including multiple-descent risk curves) that arise in multitask representation learning. }
}

Endnote

%0 Conference Paper
%T  Understanding Inverse Scaling and Emergence in Multitask Representation Learning 
%A Muhammed E Ildiz
%A Zhe Zhao
%A Samet Oymak
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-e-ildiz24a
%I PMLR
%P 4726--4734
%U https://proceedings.mlr.press/v238/e-ildiz24a.html
%V 238
%X  Large language models exhibit strong multitasking capabilities, however, their learning dynamics as a function of task characteristics, sample size, and model complexity remain mysterious. For instance, it is known that, as the model size grows, large language models exhibit emerging abilities where certain tasks can abruptly jump from poor to respectable performance. Such phenomena motivate a deeper understanding of how individual tasks evolve during multitasking. To this aim, we study a multitask representation learning setup where tasks can have distinct distributions, quantified by their covariance priors. Through random matrix theory, we precisely characterize the optimal linear representation for few-shot learning that minimizes the average test risk in terms of task covariances. When tasks have equal sample sizes, we prove a reduction to an equivalent problem with a single effective covariance from which the individual task risks of the original problem can be deduced. Importantly, we introduce “task competition” to explain how tasks with dominant covariance eigenspectrum emerge faster than others. We show that task competition can potentially explain the inverse scaling of certain tasks i.e. reduced test accuracy as the model grows. Overall, this work sheds light on the risk and emergence of individual tasks and uncovers new high-dimensional phenomena (including multiple-descent risk curves) that arise in multitask representation learning.

APA


E Ildiz, M., Zhao, Z. & Oymak, S.. (2024).  Understanding Inverse Scaling and Emergence in Multitask Representation Learning . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:4726-4734 Available from https://proceedings.mlr.press/v238/e-ildiz24a.html.

Related Material

Download PDF