Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation

Shyam Nuggehalli, Jifan Zhang, Lalit K Jain, Robert D Nowak
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:46815-46836, 2025.

Abstract

Class imbalance severely impacts machine learning performance on minority classes in real-world applications. While various solutions exist, active learning offers a fundamental fix by strategically collecting balanced, informative labeled examples from abundant unlabeled data. We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation. By reducing the problem to one-dimensional active learning, DIRECT leverages established theory to handle batch labeling and label noise – another common challenge in data annotation that particularly affects active learning methods. Our work presents the first comprehensive study of active learning under both class imbalance and label noise. Extensive experiments on imbalanced datasets show DIRECT reduces annotation costs by over 60% compared to state-of-the-art active learning methods and over 80% versus random sampling, while maintaining robustness to label noise.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-nuggehalli25a, title = {Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation}, author = {Nuggehalli, Shyam and Zhang, Jifan and Jain, Lalit K and Nowak, Robert D}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {46815--46836}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/nuggehalli25a/nuggehalli25a.pdf}, url = {https://proceedings.mlr.press/v267/nuggehalli25a.html}, abstract = {Class imbalance severely impacts machine learning performance on minority classes in real-world applications. While various solutions exist, active learning offers a fundamental fix by strategically collecting balanced, informative labeled examples from abundant unlabeled data. We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation. By reducing the problem to one-dimensional active learning, DIRECT leverages established theory to handle batch labeling and label noise – another common challenge in data annotation that particularly affects active learning methods. Our work presents the first comprehensive study of active learning under both class imbalance and label noise. Extensive experiments on imbalanced datasets show DIRECT reduces annotation costs by over 60% compared to state-of-the-art active learning methods and over 80% versus random sampling, while maintaining robustness to label noise.} }
Endnote
%0 Conference Paper %T Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation %A Shyam Nuggehalli %A Jifan Zhang %A Lalit K Jain %A Robert D Nowak %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-nuggehalli25a %I PMLR %P 46815--46836 %U https://proceedings.mlr.press/v267/nuggehalli25a.html %V 267 %X Class imbalance severely impacts machine learning performance on minority classes in real-world applications. While various solutions exist, active learning offers a fundamental fix by strategically collecting balanced, informative labeled examples from abundant unlabeled data. We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation. By reducing the problem to one-dimensional active learning, DIRECT leverages established theory to handle batch labeling and label noise – another common challenge in data annotation that particularly affects active learning methods. Our work presents the first comprehensive study of active learning under both class imbalance and label noise. Extensive experiments on imbalanced datasets show DIRECT reduces annotation costs by over 60% compared to state-of-the-art active learning methods and over 80% versus random sampling, while maintaining robustness to label noise.
APA
Nuggehalli, S., Zhang, J., Jain, L.K. & Nowak, R.D.. (2025). Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:46815-46836 Available from https://proceedings.mlr.press/v267/nuggehalli25a.html.

Related Material