Better generalization with less data using robust gradient descent

Matthew Holland; Kazushi Ikeda

Better generalization with less data using robust gradient descent

Matthew Holland, Kazushi Ikeda

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2761-2770, 2019.

Abstract

For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-holland19a,
  title = 	 {Better generalization with less data using robust gradient descent},
  author =       {Holland, Matthew and Ikeda, Kazushi},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2761--2770},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/holland19a/holland19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/holland19a.html},
  abstract = 	 {For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.}
}

Endnote

%0 Conference Paper
%T Better generalization with less data using robust gradient descent
%A Matthew Holland
%A Kazushi Ikeda
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-holland19a
%I PMLR
%P 2761--2770
%U https://proceedings.mlr.press/v97/holland19a.html
%V 97
%X For learning tasks where the data (or losses) may be heavy-tailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well off-sample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finite-sample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and real-world data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.

APA


Holland, M. & Ikeda, K.. (2019). Better generalization with less data using robust gradient descent. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2761-2770 Available from https://proceedings.mlr.press/v97/holland19a.html.

Better generalization with less data using robust gradient descent

Abstract

Cite this Paper

Related Material