Efficient Variance Reduction for Meta-learning

Hansi Yang, James Kwok
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25070-25095, 2022.

Abstract

Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-yang22g, title = {Efficient Variance Reduction for Meta-learning}, author = {Yang, Hansi and Kwok, James}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {25070--25095}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/yang22g/yang22g.pdf}, url = {https://proceedings.mlr.press/v162/yang22g.html}, abstract = {Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.} }
Endnote
%0 Conference Paper %T Efficient Variance Reduction for Meta-learning %A Hansi Yang %A James Kwok %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-yang22g %I PMLR %P 25070--25095 %U https://proceedings.mlr.press/v162/yang22g.html %V 162 %X Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.
APA
Yang, H. & Kwok, J.. (2022). Efficient Variance Reduction for Meta-learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25070-25095 Available from https://proceedings.mlr.press/v162/yang22g.html.

Related Material