On the Power of Multitask Representation Learning with Gradient Descent

Qiaobo Li; Zixiang Chen; Yihe Deng; Yiwen Kou; Yuan Cao; Quanquan Gu

On the Power of Multitask Representation Learning with Gradient Descent

Qiaobo Li, Zixiang Chen, Yihe Deng, Yiwen Kou, Yuan Cao, Quanquan Gu

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:4357-4365, 2025.

Abstract

Representation learning, particularly multi-task representation learning, has gained widespread popularity in various deep learning applications, ranging from computer vision to natural language processing, due to its remarkable generalization performance. Despite its growing use, our understanding of the underlying mechanisms remains limited. In this paper, we provide a theoretical analysis elucidating why multi-task representation learning outperforms its single-task counterpart in scenarios involving over-parameterized two-layer convolutional neural networks trained by gradient descent. Our analysis is based on a data model that encompasses both task-shared and task-specific features, a setting commonly encountered in real-world applications. We also present experiments on synthetic and real-world data to illustrate and validate our theoretical findings.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-li25i,
  title = 	 {On the Power of Multitask Representation Learning with Gradient Descent},
  author =       {Li, Qiaobo and Chen, Zixiang and Deng, Yihe and Kou, Yiwen and Cao, Yuan and Gu, Quanquan},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4357--4365},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/li25i/li25i.pdf},
  url = 	 {https://proceedings.mlr.press/v258/li25i.html},
  abstract = 	 {Representation learning, particularly multi-task representation learning, has gained widespread popularity in various deep learning applications, ranging from computer vision to natural language processing, due to its remarkable generalization performance. Despite its growing use, our understanding of the underlying mechanisms remains limited. In this paper, we provide a theoretical analysis elucidating why multi-task representation learning outperforms its single-task counterpart in scenarios involving over-parameterized two-layer convolutional neural networks trained by gradient descent. Our analysis is based on a data model that encompasses both task-shared and task-specific features, a setting commonly encountered in real-world applications. We also present experiments on synthetic and real-world data to illustrate and validate our theoretical findings.}
}

Endnote

%0 Conference Paper
%T On the Power of Multitask Representation Learning with Gradient Descent
%A Qiaobo Li
%A Zixiang Chen
%A Yihe Deng
%A Yiwen Kou
%A Yuan Cao
%A Quanquan Gu
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-li25i
%I PMLR
%P 4357--4365
%U https://proceedings.mlr.press/v258/li25i.html
%V 258
%X Representation learning, particularly multi-task representation learning, has gained widespread popularity in various deep learning applications, ranging from computer vision to natural language processing, due to its remarkable generalization performance. Despite its growing use, our understanding of the underlying mechanisms remains limited. In this paper, we provide a theoretical analysis elucidating why multi-task representation learning outperforms its single-task counterpart in scenarios involving over-parameterized two-layer convolutional neural networks trained by gradient descent. Our analysis is based on a data model that encompasses both task-shared and task-specific features, a setting commonly encountered in real-world applications. We also present experiments on synthetic and real-world data to illustrate and validate our theoretical findings.

APA

Li, Q., Chen, Z., Deng, Y., Kou, Y., Cao, Y. & Gu, Q.. (2025). On the Power of Multitask Representation Learning with Gradient Descent. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:4357-4365 Available from https://proceedings.mlr.press/v258/li25i.html.

On the Power of Multitask Representation Learning with Gradient Descent

Abstract

Cite this Paper

Related Material