Minority Oversampling for Imbalanced Data via Class-Preserving Regularized Auto-Encoders

Arnab Kumar Mondal, Lakshya Singhal, Piyush Tiwary, Parag Singla, Prathosh AP
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3440-3465, 2023.

Abstract

Class imbalance is a common phenomenon in multiple application domains such as healthcare, where the sample occurrence of one or few class categories is more prevalent in the dataset than the rest. This work addresses the class-imbalance issue by proposing an over-sampling method for the minority classes in the latent space of a Regularized Auto-Encoder (RAE). Specifically, we construct a latent space by maximizing the conditional data likelihood using an Encoder-Decoder structure, such that oversampling through convex combinations of latent samples preserves the class identity. A jointly-trained linear classifier that separates convexly coupled latent vectors from different classes is used to impose this property on the AE’s latent space. Further, the aforesaid linear classifier is used for final classification without retraining. We theoretically show that our method can achieve a low variance risk estimate compared to naive oversampling methods and is robust to overfitting. We conduct several experiments on benchmark datasets and show that our method outperforms the existing oversampling techniques for handling class imbalance. The code of the proposed method is available at: https://github.com/arnabkmondal/oversamplingrae.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-mondal23a, title = {Minority Oversampling for Imbalanced Data via Class-Preserving Regularized Auto-Encoders}, author = {Mondal, Arnab Kumar and Singhal, Lakshya and Tiwary, Piyush and Singla, Parag and {AP}, Prathosh}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {3440--3465}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/mondal23a/mondal23a.pdf}, url = {https://proceedings.mlr.press/v206/mondal23a.html}, abstract = {Class imbalance is a common phenomenon in multiple application domains such as healthcare, where the sample occurrence of one or few class categories is more prevalent in the dataset than the rest. This work addresses the class-imbalance issue by proposing an over-sampling method for the minority classes in the latent space of a Regularized Auto-Encoder (RAE). Specifically, we construct a latent space by maximizing the conditional data likelihood using an Encoder-Decoder structure, such that oversampling through convex combinations of latent samples preserves the class identity. A jointly-trained linear classifier that separates convexly coupled latent vectors from different classes is used to impose this property on the AE’s latent space. Further, the aforesaid linear classifier is used for final classification without retraining. We theoretically show that our method can achieve a low variance risk estimate compared to naive oversampling methods and is robust to overfitting. We conduct several experiments on benchmark datasets and show that our method outperforms the existing oversampling techniques for handling class imbalance. The code of the proposed method is available at: https://github.com/arnabkmondal/oversamplingrae.} }
Endnote
%0 Conference Paper %T Minority Oversampling for Imbalanced Data via Class-Preserving Regularized Auto-Encoders %A Arnab Kumar Mondal %A Lakshya Singhal %A Piyush Tiwary %A Parag Singla %A Prathosh AP %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-mondal23a %I PMLR %P 3440--3465 %U https://proceedings.mlr.press/v206/mondal23a.html %V 206 %X Class imbalance is a common phenomenon in multiple application domains such as healthcare, where the sample occurrence of one or few class categories is more prevalent in the dataset than the rest. This work addresses the class-imbalance issue by proposing an over-sampling method for the minority classes in the latent space of a Regularized Auto-Encoder (RAE). Specifically, we construct a latent space by maximizing the conditional data likelihood using an Encoder-Decoder structure, such that oversampling through convex combinations of latent samples preserves the class identity. A jointly-trained linear classifier that separates convexly coupled latent vectors from different classes is used to impose this property on the AE’s latent space. Further, the aforesaid linear classifier is used for final classification without retraining. We theoretically show that our method can achieve a low variance risk estimate compared to naive oversampling methods and is robust to overfitting. We conduct several experiments on benchmark datasets and show that our method outperforms the existing oversampling techniques for handling class imbalance. The code of the proposed method is available at: https://github.com/arnabkmondal/oversamplingrae.
APA
Mondal, A.K., Singhal, L., Tiwary, P., Singla, P. & AP, P.. (2023). Minority Oversampling for Imbalanced Data via Class-Preserving Regularized Auto-Encoders. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3440-3465 Available from https://proceedings.mlr.press/v206/mondal23a.html.

Related Material