Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1929-1938, 2020.

Abstract

In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning with complementary labels. We argue the quality of gradient estimation matters more in risk minimization. Theoretically, we show that a URE gives an unbiased gradient estimator(UGE). Practically, however, UGEs may suffer from huge variance, which causes empirical gradients to be usually far away from true gradients during minimization. To this end, we propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance and makes empirical gradients more aligned with true gradients in the direction. Thanks to this characteristic, SCL successfully mitigates the overfitting issue and improves URE-based methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-chou20a, title = {Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels}, author = {Chou, Yu-Ting and Niu, Gang and Lin, Hsuan-Tien and Sugiyama, Masashi}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {1929--1938}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/chou20a/chou20a.pdf}, url = {http://proceedings.mlr.press/v119/chou20a.html}, abstract = {In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning with complementary labels. We argue the quality of gradient estimation matters more in risk minimization. Theoretically, we show that a URE gives an unbiased gradient estimator(UGE). Practically, however, UGEs may suffer from huge variance, which causes empirical gradients to be usually far away from true gradients during minimization. To this end, we propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance and makes empirical gradients more aligned with true gradients in the direction. Thanks to this characteristic, SCL successfully mitigates the overfitting issue and improves URE-based methods.} }
Endnote
%0 Conference Paper %T Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels %A Yu-Ting Chou %A Gang Niu %A Hsuan-Tien Lin %A Masashi Sugiyama %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-chou20a %I PMLR %P 1929--1938 %U http://proceedings.mlr.press/v119/chou20a.html %V 119 %X In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning with complementary labels. We argue the quality of gradient estimation matters more in risk minimization. Theoretically, we show that a URE gives an unbiased gradient estimator(UGE). Practically, however, UGEs may suffer from huge variance, which causes empirical gradients to be usually far away from true gradients during minimization. To this end, we propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance and makes empirical gradients more aligned with true gradients in the direction. Thanks to this characteristic, SCL successfully mitigates the overfitting issue and improves URE-based methods.
APA
Chou, Y., Niu, G., Lin, H. & Sugiyama, M.. (2020). Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1929-1938 Available from http://proceedings.mlr.press/v119/chou20a.html.

Related Material