Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Florian E. Dorner, Moritz Hardt
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:11544-11572, 2024.

Abstract

We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It’s common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it’s best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cramér’s theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding’s bound.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-dorner24a, title = {Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget}, author = {Dorner, Florian E. and Hardt, Moritz}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {11544--11572}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/dorner24a/dorner24a.pdf}, url = {https://proceedings.mlr.press/v235/dorner24a.html}, abstract = {We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It’s common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it’s best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cramér’s theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding’s bound.} }
Endnote
%0 Conference Paper %T Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget %A Florian E. Dorner %A Moritz Hardt %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-dorner24a %I PMLR %P 11544--11572 %U https://proceedings.mlr.press/v235/dorner24a.html %V 235 %X We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It’s common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it’s best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cramér’s theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding’s bound.
APA
Dorner, F.E. & Hardt, M.. (2024). Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:11544-11572 Available from https://proceedings.mlr.press/v235/dorner24a.html.

Related Material