Better generalization with less data using robust gradient descent
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:27612770, 2019.
Abstract
For learning tasks where the data (or losses) may be heavytailed, algorithms based on empirical risk minimization may require a substantial number of observations in order to perform well offsample. In pursuit of stronger performance under weaker assumptions, we propose a technique which uses a cheap and robust iterative estimate of the risk gradient, which can be easily fed into any steepest descent procedure. Finitesample risk bounds are provided under weak moment assumptions on the loss gradient. The algorithm is simple to implement, and empirical tests using simulations and realworld data illustrate that more efficient and reliable learning is possible without prior knowledge of the loss tails.
Related Material


