Learning Imbalanced Data with Beneficial Label Noise

Guangzheng Hu, Feng Liu, Mingming Gong, Guanghui Wang, Liuhua Peng
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:24535-24569, 2025.

Abstract

Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-hu25p, title = {Learning Imbalanced Data with Beneficial Label Noise}, author = {Hu, Guangzheng and Liu, Feng and Gong, Mingming and Wang, Guanghui and Peng, Liuhua}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {24535--24569}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hu25p/hu25p.pdf}, url = {https://proceedings.mlr.press/v267/hu25p.html}, abstract = {Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.} }
Endnote
%0 Conference Paper %T Learning Imbalanced Data with Beneficial Label Noise %A Guangzheng Hu %A Feng Liu %A Mingming Gong %A Guanghui Wang %A Liuhua Peng %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-hu25p %I PMLR %P 24535--24569 %U https://proceedings.mlr.press/v267/hu25p.html %V 267 %X Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.
APA
Hu, G., Liu, F., Gong, M., Wang, G. & Peng, L.. (2025). Learning Imbalanced Data with Beneficial Label Noise. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:24535-24569 Available from https://proceedings.mlr.press/v267/hu25p.html.

Related Material