Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces

Ankit Singh Rawat; Aditya K Menon; Wittawat Jitkrittum; Sadeep Jayasumana; Felix Yu; Sashank Reddi; Sanjiv Kumar

Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces

Ankit Singh Rawat, Aditya K Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix Yu, Sashank Reddi, Sanjiv Kumar

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8890-8901, 2021.

Abstract

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v139-rawat21a,
  title = 	 {Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces},
  author =       {Rawat, Ankit Singh and Menon, Aditya K and Jitkrittum, Wittawat and Jayasumana, Sadeep and Yu, Felix and Reddi, Sashank and Kumar, Sanjiv},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {8890--8901},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/rawat21a/rawat21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/rawat21a.html},
  abstract = 	 {Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.}
}

Endnote

%0 Conference Paper
%T Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces
%A Ankit Singh Rawat
%A Aditya K Menon
%A Wittawat Jitkrittum
%A Sadeep Jayasumana
%A Felix Yu
%A Sashank Reddi
%A Sanjiv Kumar
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-rawat21a
%I PMLR
%P 8890--8901
%U https://proceedings.mlr.press/v139/rawat21a.html
%V 139
%X Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.

APA


Rawat, A.S., Menon, A.K., Jitkrittum, W., Jayasumana, S., Yu, F., Reddi, S. & Kumar, S.. (2021). Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8890-8901 Available from https://proceedings.mlr.press/v139/rawat21a.html.

Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces

Abstract

Cite this Paper

Related Material