When MAML Can Adapt Fast and How to Assist When It Cannot

Sébastien Arnold, Shariq Iqbal, Fei Sha
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:244-252, 2021.

Abstract

Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. Nonetheless, we have just started to understand and analyze how they are able to adapt fast to new tasks. In this work, we contribute by conducting a series of empirical and theoretical studies, and discover several interesting, previously unknown properties of the algorithm. First, we find MAML adapts better with a deep architecture even if the tasks need only a shallow one. Secondly, linear layers can be added to the output layers of a shallower model to increase the depth without altering the modelling capacity, leading to improved performance in adaptation. Alternatively, an external and separate neural network meta-optimizer can also be used to transform the gradient updates of a smaller model so as to obtain improved performances in adaptation. Drawing from these evidences, we theorize that for a deep neural network to meta-learn well, the upper layers must transform the gradients of the bottom layers as if the upper layers were an external meta-optimizer, operating on a smaller network that is composed of the bottom layers.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-arnold21a, title = { When MAML Can Adapt Fast and How to Assist When It Cannot }, author = {Arnold, S{\'e}bastien M. R. and Iqbal, Shariq and Sha, Fei}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {244--252}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/arnold21a/arnold21a.pdf}, url = {http://proceedings.mlr.press/v130/arnold21a.html}, abstract = { Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. Nonetheless, we have just started to understand and analyze how they are able to adapt fast to new tasks. In this work, we contribute by conducting a series of empirical and theoretical studies, and discover several interesting, previously unknown properties of the algorithm. First, we find MAML adapts better with a deep architecture even if the tasks need only a shallow one. Secondly, linear layers can be added to the output layers of a shallower model to increase the depth without altering the modelling capacity, leading to improved performance in adaptation. Alternatively, an external and separate neural network meta-optimizer can also be used to transform the gradient updates of a smaller model so as to obtain improved performances in adaptation. Drawing from these evidences, we theorize that for a deep neural network to meta-learn well, the upper layers must transform the gradients of the bottom layers as if the upper layers were an external meta-optimizer, operating on a smaller network that is composed of the bottom layers. } }
Endnote
%0 Conference Paper %T When MAML Can Adapt Fast and How to Assist When It Cannot %A Sébastien Arnold %A Shariq Iqbal %A Fei Sha %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-arnold21a %I PMLR %P 244--252 %U http://proceedings.mlr.press/v130/arnold21a.html %V 130 %X Model-Agnostic Meta-Learning (MAML) and its variants have achieved success in meta-learning tasks on many datasets and settings. Nonetheless, we have just started to understand and analyze how they are able to adapt fast to new tasks. In this work, we contribute by conducting a series of empirical and theoretical studies, and discover several interesting, previously unknown properties of the algorithm. First, we find MAML adapts better with a deep architecture even if the tasks need only a shallow one. Secondly, linear layers can be added to the output layers of a shallower model to increase the depth without altering the modelling capacity, leading to improved performance in adaptation. Alternatively, an external and separate neural network meta-optimizer can also be used to transform the gradient updates of a smaller model so as to obtain improved performances in adaptation. Drawing from these evidences, we theorize that for a deep neural network to meta-learn well, the upper layers must transform the gradients of the bottom layers as if the upper layers were an external meta-optimizer, operating on a smaller network that is composed of the bottom layers.
APA
Arnold, S., Iqbal, S. & Sha, F.. (2021). When MAML Can Adapt Fast and How to Assist When It Cannot . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:244-252 Available from http://proceedings.mlr.press/v130/arnold21a.html.

Related Material