Efficient Variance Reduction for Meta-learning

Hansi Yang; James Kwok

Efficient Variance Reduction for Meta-learning

Hansi Yang, James Kwok

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25070-25095, 2022.

Abstract

Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-yang22g,
  title = 	 {Efficient Variance Reduction for Meta-learning},
  author =       {Yang, Hansi and Kwok, James},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {25070--25095},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/yang22g/yang22g.pdf},
  url = 	 {https://proceedings.mlr.press/v162/yang22g.html},
  abstract = 	 {Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.}
}

Endnote

%0 Conference Paper
%T Efficient Variance Reduction for Meta-learning
%A Hansi Yang
%A James Kwok
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-yang22g
%I PMLR
%P 25070--25095
%U https://proceedings.mlr.press/v162/yang22g.html
%V 162
%X Meta-learning tries to learn meta-knowledge from a large number of tasks. However, the stochastic meta-gradient can have large variance due to data sampling (from each task) and task sampling (from the whole task distribution), leading to slow convergence. In this paper, we propose a novel approach that integrates variance reduction with first-order meta-learning algorithms such as Reptile. It retains the bilevel formulation which better captures the structure of meta-learning, but does not require storing the vast number of task-specific parameters in general bilevel variance reduction methods. Theoretical results show that it has fast convergence rate due to variance reduction. Experiments on benchmark few-shot classification data sets demonstrate its effectiveness over state-of-the-art meta-learning algorithms with and without variance reduction.

APA


Yang, H. & Kwok, J.. (2022). Efficient Variance Reduction for Meta-learning. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25070-25095 Available from https://proceedings.mlr.press/v162/yang22g.html.

Related Material

Download PDF