SelectNet: Learning to Sample from the Wild for Imbalanced Data Training

Yunru Liu, Tingran Gao, Haizhao Yang
; Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:193-206, 2020.

Abstract

Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data “in the wild” that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision.

Cite this Paper


BibTeX
@InProceedings{pmlr-v107-liu20a, title = {{SelectNet: L}earning to Sample from the Wild for Imbalanced Data Training}, author = {Liu, Yunru and Gao, Tingran and Yang, Haizhao}, booktitle = {Proceedings of The First Mathematical and Scientific Machine Learning Conference}, pages = {193--206}, year = {2020}, editor = {Jianfeng Lu and Rachel Ward}, volume = {107}, series = {Proceedings of Machine Learning Research}, address = {Princeton University, Princeton, NJ, USA}, month = {20--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v107/liu20a/liu20a.pdf}, url = {http://proceedings.mlr.press/v107/liu20a.html}, abstract = { Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data “in the wild” that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision. } }
Endnote
%0 Conference Paper %T SelectNet: Learning to Sample from the Wild for Imbalanced Data Training %A Yunru Liu %A Tingran Gao %A Haizhao Yang %B Proceedings of The First Mathematical and Scientific Machine Learning Conference %C Proceedings of Machine Learning Research %D 2020 %E Jianfeng Lu %E Rachel Ward %F pmlr-v107-liu20a %I PMLR %J Proceedings of Machine Learning Research %P 193--206 %U http://proceedings.mlr.press %V 107 %W PMLR %X Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data “in the wild” that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision.
APA
Liu, Y., Gao, T. & Yang, H.. (2020). SelectNet: Learning to Sample from the Wild for Imbalanced Data Training. Proceedings of The First Mathematical and Scientific Machine Learning Conference, in PMLR 107:193-206

Related Material