Sharp error bounds for imbalanced classification: how many examples in the minority class?

Anass Aghbalou, Anne Sabourin, François Portier
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:838-846, 2024.

Abstract

When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-aghbalou24a, title = { Sharp error bounds for imbalanced classification: how many examples in the minority class? }, author = {Aghbalou, Anass and Sabourin, Anne and Portier, Fran\c{c}ois}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {838--846}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/aghbalou24a/aghbalou24a.pdf}, url = {https://proceedings.mlr.press/v238/aghbalou24a.html}, abstract = { When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field. } }
Endnote
%0 Conference Paper %T Sharp error bounds for imbalanced classification: how many examples in the minority class? %A Anass Aghbalou %A Anne Sabourin %A François Portier %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-aghbalou24a %I PMLR %P 838--846 %U https://proceedings.mlr.press/v238/aghbalou24a.html %V 238 %X When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.
APA
Aghbalou, A., Sabourin, A. & Portier, F.. (2024). Sharp error bounds for imbalanced classification: how many examples in the minority class? . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:838-846 Available from https://proceedings.mlr.press/v238/aghbalou24a.html.

Related Material