Choosing the Sample with Lowest Loss makes SGD Robust

Vatsal Shah, Xiaoxia Wu, Sujay Sanghavi
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2120-2130, 2020.

Abstract

The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD corresponds to $k=1$, i.e. no choice; $k>=2$ represents a new algorithm that is however effectively minimizing a non-convex surrogate loss. Our main contribution is a theoretical analysis of the robustness properties of this idea for ML problems which are sums of convex losses; these are backed up with synthetic and small-scale neural network experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-shah20a, title = {Choosing the Sample with Lowest Loss makes SGD Robust}, author = {Shah, Vatsal and Wu, Xiaoxia and Sanghavi, Sujay}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2120--2130}, year = {2020}, editor = {Silvia Chiappa and Roberto Calandra}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/shah20a/shah20a.pdf}, url = { http://proceedings.mlr.press/v108/shah20a.html }, abstract = {The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD corresponds to $k=1$, i.e. no choice; $k>=2$ represents a new algorithm that is however effectively minimizing a non-convex surrogate loss. Our main contribution is a theoretical analysis of the robustness properties of this idea for ML problems which are sums of convex losses; these are backed up with synthetic and small-scale neural network experiments.} }
Endnote
%0 Conference Paper %T Choosing the Sample with Lowest Loss makes SGD Robust %A Vatsal Shah %A Xiaoxia Wu %A Sujay Sanghavi %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-shah20a %I PMLR %P 2120--2130 %U http://proceedings.mlr.press/v108/shah20a.html %V 108 %X The presence of outliers can potentially significantly skew the parameters of machine learning models trained via stochastic gradient descent (SGD). In this paper we propose a simple variant of the simple SGD method: in each step, first choose a set of k samples, then from these choose the one with the smallest current loss, and do an SGD-like update with this chosen sample. Vanilla SGD corresponds to $k=1$, i.e. no choice; $k>=2$ represents a new algorithm that is however effectively minimizing a non-convex surrogate loss. Our main contribution is a theoretical analysis of the robustness properties of this idea for ML problems which are sums of convex losses; these are backed up with synthetic and small-scale neural network experiments.
APA
Shah, V., Wu, X. & Sanghavi, S.. (2020). Choosing the Sample with Lowest Loss makes SGD Robust. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2120-2130 Available from http://proceedings.mlr.press/v108/shah20a.html .

Related Material