A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8512-8521, 2020.

Abstract

One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $O(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-saunshi20a, title = {A Sample Complexity Separation between Non-Convex and Convex Meta-Learning}, author = {Saunshi, Nikunj and Zhang, Yi and Khodak, Mikhail and Arora, Sanjeev}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8512--8521}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/saunshi20a/saunshi20a.pdf}, url = {https://proceedings.mlr.press/v119/saunshi20a.html}, abstract = {One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $O(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.} }
Endnote
%0 Conference Paper %T A Sample Complexity Separation between Non-Convex and Convex Meta-Learning %A Nikunj Saunshi %A Yi Zhang %A Mikhail Khodak %A Sanjeev Arora %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-saunshi20a %I PMLR %P 8512--8521 %U https://proceedings.mlr.press/v119/saunshi20a.html %V 119 %X One popular trend in meta-learning is to learn from many training tasks a common initialization that a gradient-based method can use to solve a new task with few samples. The theory of meta-learning is still in its early stages, with several recent learning-theoretic analyses of methods such as Reptile [Nichol et al., 2018] being for \emph{convex models}. This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any \emph{initialization-based meta-learning} algorithm is $\Omega(d)$, where $d$ is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $O(1)$, demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
APA
Saunshi, N., Zhang, Y., Khodak, M. & Arora, S.. (2020). A Sample Complexity Separation between Non-Convex and Convex Meta-Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8512-8521 Available from https://proceedings.mlr.press/v119/saunshi20a.html.

Related Material