Minimizing the Maximal Loss: How and Why

Shai Shalev-Shwartz; Yonatan Wexler

Minimizing the Maximal Loss: How and Why

Shai Shalev-Shwartz, Yonatan Wexler

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:793-801, 2016.

Abstract

A commonly used learning rule is to approximately minimize the \emphaverage loss over the training set. Other learning algorithms, such as AdaBoost and hard-SVM, aim at minimizing the \emphmaximal loss over the training set. The average loss is more popular, particularly in deep learning, due to three main reasons. First, it can be conveniently minimized using online algorithms, that process few examples at each iteration. Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss. Last, the maximal loss is not robust to outliers. In this paper we describe and analyze an algorithm that can convert any online algorithm to a minimizer of the maximal loss. We show, theoretically and empirically, that in some situations better accuracy on the training set is crucial to obtain good performance on unseen examples. Last, we propose robust versions of the approach that can handle outliers.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-shalev-shwartzb16,
  title = 	 {Minimizing the Maximal Loss: How and Why},
  author = 	 {Shalev-Shwartz, Shai and Wexler, Yonatan},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {793--801},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/shalev-shwartzb16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/shalev-shwartzb16.html},
  abstract = 	 {A commonly used learning rule is to approximately minimize the \emphaverage loss over the training set. Other learning algorithms, such as AdaBoost and hard-SVM, aim at minimizing the \emphmaximal loss over the training set. The average loss is more popular, particularly in deep learning, due to three main reasons. First, it can be conveniently minimized using online algorithms, that process few examples at each iteration. Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss. Last, the maximal loss is not robust to outliers. In this paper we describe and analyze an algorithm that can convert any online algorithm to a minimizer of the maximal loss. We show, theoretically and empirically, that in some situations better accuracy on the training set is crucial to obtain good performance on unseen examples. Last, we propose robust versions of the approach that can handle outliers.}
}

Endnote

%0 Conference Paper
%T Minimizing the Maximal Loss: How and Why
%A Shai Shalev-Shwartz
%A Yonatan Wexler
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-shalev-shwartzb16
%I PMLR
%P 793--801
%U https://proceedings.mlr.press/v48/shalev-shwartzb16.html
%V 48
%X A commonly used learning rule is to approximately minimize the \emphaverage loss over the training set. Other learning algorithms, such as AdaBoost and hard-SVM, aim at minimizing the \emphmaximal loss over the training set. The average loss is more popular, particularly in deep learning, due to three main reasons. First, it can be conveniently minimized using online algorithms, that process few examples at each iteration. Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss. Last, the maximal loss is not robust to outliers. In this paper we describe and analyze an algorithm that can convert any online algorithm to a minimizer of the maximal loss. We show, theoretically and empirically, that in some situations better accuracy on the training set is crucial to obtain good performance on unseen examples. Last, we propose robust versions of the approach that can handle outliers.

RIS


TY  - CPAPER
TI  - Minimizing the Maximal Loss: How and Why
AU  - Shai Shalev-Shwartz
AU  - Yonatan Wexler
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-shalev-shwartzb16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 793
EP  - 801
L1  - http://proceedings.mlr.press/v48/shalev-shwartzb16.pdf
UR  - https://proceedings.mlr.press/v48/shalev-shwartzb16.html
AB  - A commonly used learning rule is to approximately minimize the \emphaverage loss over the training set. Other learning algorithms, such as AdaBoost and hard-SVM, aim at minimizing the \emphmaximal loss over the training set. The average loss is more popular, particularly in deep learning, due to three main reasons. First, it can be conveniently minimized using online algorithms, that process few examples at each iteration. Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss. Last, the maximal loss is not robust to outliers. In this paper we describe and analyze an algorithm that can convert any online algorithm to a minimizer of the maximal loss. We show, theoretically and empirically, that in some situations better accuracy on the training set is crucial to obtain good performance on unseen examples. Last, we propose robust versions of the approach that can handle outliers.
ER  -

APA


Shalev-Shwartz, S. & Wexler, Y.. (2016). Minimizing the Maximal Loss: How and Why. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:793-801 Available from https://proceedings.mlr.press/v48/shalev-shwartzb16.html.

Minimizing the Maximal Loss: How and Why

Abstract

Cite this Paper

Related Material