Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces

Ankit Singh Rawat, Aditya K Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix Yu, Sashank Reddi, Sanjiv Kumar
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8890-8901, 2021.

Abstract

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-rawat21a, title = {Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces}, author = {Rawat, Ankit Singh and Menon, Aditya K and Jitkrittum, Wittawat and Jayasumana, Sadeep and Yu, Felix and Reddi, Sashank and Kumar, Sanjiv}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8890--8901}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/rawat21a/rawat21a.pdf}, url = {https://proceedings.mlr.press/v139/rawat21a.html}, abstract = {Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.} }
Endnote
%0 Conference Paper %T Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces %A Ankit Singh Rawat %A Aditya K Menon %A Wittawat Jitkrittum %A Sadeep Jayasumana %A Felix Yu %A Sashank Reddi %A Sanjiv Kumar %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-rawat21a %I PMLR %P 8890--8901 %U https://proceedings.mlr.press/v139/rawat21a.html %V 139 %X Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.
APA
Rawat, A.S., Menon, A.K., Jitkrittum, W., Jayasumana, S., Yu, F., Reddi, S. & Kumar, S.. (2021). Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8890-8901 Available from https://proceedings.mlr.press/v139/rawat21a.html.

Related Material