A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Nikunj Saunshi; Yi Zhang; Mikhail Khodak; Sanjeev Arora

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8512-8521, 2020.

Abstract

One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $O(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-saunshi20a,
  title = 	 {A Sample Complexity Separation between Non-Convex and Convex Meta-Learning},
  author =       {Saunshi, Nikunj and Zhang, Yi and Khodak, Mikhail and Arora, Sanjeev},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {8512--8521},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/saunshi20a/saunshi20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/saunshi20a.html},
  abstract = 	 {One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $O(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.}
}

Endnote

%0 Conference Paper
%T A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
%A Nikunj Saunshi
%A Yi Zhang
%A Mikhail Khodak
%A Sanjeev Arora
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-saunshi20a
%I PMLR
%P 8512--8521
%U https://proceedings.mlr.press/v119/saunshi20a.html
%V 119
%X One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $O(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

APA

Saunshi, N., Zhang, Y., Khodak, M. & Arora, S.. (2020). A Sample Complexity Separation between Non-Convex and Convex Meta-Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8512-8521 Available from https://proceedings.mlr.press/v119/saunshi20a.html.

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Abstract

Cite this Paper

Related Material