Better generalization with less data using robust gradient descent

Matthew Holland, Kazushi Ikeda
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2761-2770, 2019.

Abstract

For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-holland19a, title = {Better generalization with less data using robust gradient descent}, author = {Holland, Matthew and Ikeda, Kazushi}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2761--2770}, year = {2019}, editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/holland19a/holland19a.pdf}, url = { http://proceedings.mlr.press/v97/holland19a.html }, abstract = {For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.} }
Endnote
%0 Conference Paper %T Better generalization with less data using robust gradient descent %A Matthew Holland %A Kazushi Ikeda %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-holland19a %I PMLR %P 2761--2770 %U http://proceedings.mlr.press/v97/holland19a.html %V 97 %X For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.
APA
Holland, M. & Ikeda, K.. (2019). Better generalization with less data using robust gradient descent. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2761-2770 Available from http://proceedings.mlr.press/v97/holland19a.html .

Related Material