Generative Oversampling for Imbalanced Data via Majority-Guided VAE

Qingzhong Ai, Pengyun Wang, Lirong He, Liangjian Wen, Lujia Pan, Zenglin Xu
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3315-3330, 2023.

Abstract

Learning with imbalanced data is a challenging problem in deep learning. Over-sampling is a widely used technique to re-balance the sampling distribution of training data. However, most existing over-sampling methods only use intra-class information of minority classes to augment the data but ignore the inter-class relationships with the majority ones, which is prone to overfitting, especially when the imbalance ratio is large. To address this issue, we propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks. Furthermore, to prevent model collapse under limited data, we first pre-train MGVAE on sufficient majority samples and then fine-tune based on minority samples with Elastic Weight Consolidation(EWC) regularization. Experimental results on benchmark image datasets and real-world tabular data show that MGVAE achieves competitive improvements over other over-sampling methods in downstream classification tasks, demonstrating the effectiveness of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-ai23a, title = {Generative Oversampling for Imbalanced Data via Majority-Guided VAE}, author = {Ai, Qingzhong and Wang, Pengyun and He, Lirong and Wen, Liangjian and Pan, Lujia and Xu, Zenglin}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {3315--3330}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/ai23a/ai23a.pdf}, url = {https://proceedings.mlr.press/v206/ai23a.html}, abstract = {Learning with imbalanced data is a challenging problem in deep learning. Over-sampling is a widely used technique to re-balance the sampling distribution of training data. However, most existing over-sampling methods only use intra-class information of minority classes to augment the data but ignore the inter-class relationships with the majority ones, which is prone to overfitting, especially when the imbalance ratio is large. To address this issue, we propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks. Furthermore, to prevent model collapse under limited data, we first pre-train MGVAE on sufficient majority samples and then fine-tune based on minority samples with Elastic Weight Consolidation(EWC) regularization. Experimental results on benchmark image datasets and real-world tabular data show that MGVAE achieves competitive improvements over other over-sampling methods in downstream classification tasks, demonstrating the effectiveness of our method.} }
Endnote
%0 Conference Paper %T Generative Oversampling for Imbalanced Data via Majority-Guided VAE %A Qingzhong Ai %A Pengyun Wang %A Lirong He %A Liangjian Wen %A Lujia Pan %A Zenglin Xu %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-ai23a %I PMLR %P 3315--3330 %U https://proceedings.mlr.press/v206/ai23a.html %V 206 %X Learning with imbalanced data is a challenging problem in deep learning. Over-sampling is a widely used technique to re-balance the sampling distribution of training data. However, most existing over-sampling methods only use intra-class information of minority classes to augment the data but ignore the inter-class relationships with the majority ones, which is prone to overfitting, especially when the imbalance ratio is large. To address this issue, we propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior. In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks. Furthermore, to prevent model collapse under limited data, we first pre-train MGVAE on sufficient majority samples and then fine-tune based on minority samples with Elastic Weight Consolidation(EWC) regularization. Experimental results on benchmark image datasets and real-world tabular data show that MGVAE achieves competitive improvements over other over-sampling methods in downstream classification tasks, demonstrating the effectiveness of our method.
APA
Ai, Q., Wang, P., He, L., Wen, L., Pan, L. & Xu, Z.. (2023). Generative Oversampling for Imbalanced Data via Majority-Guided VAE. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3315-3330 Available from https://proceedings.mlr.press/v206/ai23a.html.

Related Material