Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

Giulia Denevi, Carlo Ciliberto, Riccardo Grazzi, Massimiliano Pontil
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1566-1575, 2019.

Abstract

We study the problem of learning-to-learn: infer- ring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their vari- ance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-denevi19a, title = {Learning-to-Learn Stochastic Gradient Descent with Biased Regularization}, author = {Denevi, Giulia and Ciliberto, Carlo and Grazzi, Riccardo and Pontil, Massimiliano}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1566--1575}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/denevi19a/denevi19a.pdf}, url = {https://proceedings.mlr.press/v97/denevi19a.html}, abstract = {We study the problem of learning-to-learn: infer- ring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their vari- ance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.} }
Endnote
%0 Conference Paper %T Learning-to-Learn Stochastic Gradient Descent with Biased Regularization %A Giulia Denevi %A Carlo Ciliberto %A Riccardo Grazzi %A Massimiliano Pontil %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-denevi19a %I PMLR %P 1566--1575 %U https://proceedings.mlr.press/v97/denevi19a.html %V 97 %X We study the problem of learning-to-learn: infer- ring a learning algorithm that works well on a family of tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent (SGD) on the true risk regularized by the square euclidean distance from a bias vector. We present an average excess risk bound for such a learning algorithm that quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then propose a novel meta-algorithm to estimate the bias term online from a sequence of observed tasks. The small memory footprint and low time complexity of our approach makes it appealing in practice while our theoretical analysis provides guarantees on the generalization properties of the meta-algorithm on new tasks. A key feature of our results is that, when the number of tasks grows and their vari- ance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by standard SGD without a bias term. Numerical experiments demonstrate the effectiveness of our approach in practice.
APA
Denevi, G., Ciliberto, C., Grazzi, R. & Pontil, M.. (2019). Learning-to-Learn Stochastic Gradient Descent with Biased Regularization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1566-1575 Available from https://proceedings.mlr.press/v97/denevi19a.html.

Related Material