Loss factorization, weakly supervised learning and label noise robustness

Giorgio Patrini; Frank Nielsen; Richard Nock; Marcello Carioni

Loss factorization, weakly supervised learning and label noise robustness

Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:708-717, 2016.

Abstract

We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator — the focal quantity of this work — which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-patrini16,
  title = 	 {Loss factorization, weakly supervised learning and label noise robustness},
  author = 	 {Patrini, Giorgio and Nielsen, Frank and Nock, Richard and Carioni, Marcello},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {708--717},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/patrini16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/patrini16.html},
  abstract = 	 {We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator — the focal quantity of this work — which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.}
}

Endnote

%0 Conference Paper
%T Loss factorization, weakly supervised learning and label noise robustness
%A Giorgio Patrini
%A Frank Nielsen
%A Richard Nock
%A Marcello Carioni
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-patrini16
%I PMLR
%P 708--717
%U https://proceedings.mlr.press/v48/patrini16.html
%V 48
%X We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator — the focal quantity of this work — which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.

RIS


TY  - CPAPER
TI  - Loss factorization, weakly supervised learning and label noise robustness
AU  - Giorgio Patrini
AU  - Frank Nielsen
AU  - Richard Nock
AU  - Marcello Carioni
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-patrini16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 708
EP  - 717
L1  - http://proceedings.mlr.press/v48/patrini16.pdf
UR  - https://proceedings.mlr.press/v48/patrini16.html
AB  - We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator — the focal quantity of this work — which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.
ER  -

APA


Patrini, G., Nielsen, F., Nock, R. & Carioni, M.. (2016). Loss factorization, weakly supervised learning and label noise robustness. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:708-717 Available from https://proceedings.mlr.press/v48/patrini16.html.

Loss factorization, weakly supervised learning and label noise robustness

Abstract

Cite this Paper

Related Material