Improving Imbalanced Learning by Pre-finetuning with Data Augmentation

Yiwen Shi, Taha ValizadehAslani, Jing Wang, Ping Ren, Yi Zhang, Meng Hu, Liang Zhao, Hualou Liang
Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 183:68-82, 2022.

Abstract

Imbalanced data is ubiquitous in the real world, where there is an uneven distribution of classes in the datasets. Such class imbalance poses a major challenge for modern deep learning, even with the typical class-balanced approaches such as re-sampling and re-weighting. In this work, we introduced a simple training strategy, namely pre-finetuning, as a new intermediate training stage in between the pretrained model and finetuning. We leveraged the idea of data augmentation to learn an initial representation that better fits the imbalanced distribution of the domain task during the pre-finetuning stage. We tested our method on manually contrived imbalanced datasets (both two-class and multi-class) and the FDA drug labeling dataset for ADME (i.e., absorption, distribution, metabolism, and excretion) classification. We found that, compared with standard single-stage training (i.e., vanilla finetuning), our method consistently attains improved model performance by large margins. Our work demonstrated that pre-finetuning is a simple, yet effective, learning strategy for imbalanced data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v183-shi22a, title = {Improving Imbalanced Learning by Pre-finetuning with Data Augmentation}, author = {Shi, Yiwen and ValizadehAslani, Taha and Wang, Jing and Ren, Ping and Zhang, Yi and Hu, Meng and Zhao, Liang and Liang, Hualou}, booktitle = {Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {68--82}, year = {2022}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luís and Japkowicz, Nathalie and Wozniak, Michal and Wang, Shuo}, volume = {183}, series = {Proceedings of Machine Learning Research}, month = {23 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v183/shi22a/shi22a.pdf}, url = {https://proceedings.mlr.press/v183/shi22a.html}, abstract = {Imbalanced data is ubiquitous in the real world, where there is an uneven distribution of classes in the datasets. Such class imbalance poses a major challenge for modern deep learning, even with the typical class-balanced approaches such as re-sampling and re-weighting. In this work, we introduced a simple training strategy, namely pre-finetuning, as a new intermediate training stage in between the pretrained model and finetuning. We leveraged the idea of data augmentation to learn an initial representation that better fits the imbalanced distribution of the domain task during the pre-finetuning stage. We tested our method on manually contrived imbalanced datasets (both two-class and multi-class) and the FDA drug labeling dataset for ADME (i.e., absorption, distribution, metabolism, and excretion) classification. We found that, compared with standard single-stage training (i.e., vanilla finetuning), our method consistently attains improved model performance by large margins. Our work demonstrated that pre-finetuning is a simple, yet effective, learning strategy for imbalanced data.} }
Endnote
%0 Conference Paper %T Improving Imbalanced Learning by Pre-finetuning with Data Augmentation %A Yiwen Shi %A Taha ValizadehAslani %A Jing Wang %A Ping Ren %A Yi Zhang %A Meng Hu %A Liang Zhao %A Hualou Liang %B Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2022 %E Nuno Moniz %E Paula Branco %E Luís Torgo %E Nathalie Japkowicz %E Michal Wozniak %E Shuo Wang %F pmlr-v183-shi22a %I PMLR %P 68--82 %U https://proceedings.mlr.press/v183/shi22a.html %V 183 %X Imbalanced data is ubiquitous in the real world, where there is an uneven distribution of classes in the datasets. Such class imbalance poses a major challenge for modern deep learning, even with the typical class-balanced approaches such as re-sampling and re-weighting. In this work, we introduced a simple training strategy, namely pre-finetuning, as a new intermediate training stage in between the pretrained model and finetuning. We leveraged the idea of data augmentation to learn an initial representation that better fits the imbalanced distribution of the domain task during the pre-finetuning stage. We tested our method on manually contrived imbalanced datasets (both two-class and multi-class) and the FDA drug labeling dataset for ADME (i.e., absorption, distribution, metabolism, and excretion) classification. We found that, compared with standard single-stage training (i.e., vanilla finetuning), our method consistently attains improved model performance by large margins. Our work demonstrated that pre-finetuning is a simple, yet effective, learning strategy for imbalanced data.
APA
Shi, Y., ValizadehAslani, T., Wang, J., Ren, P., Zhang, Y., Hu, M., Zhao, L. & Liang, H.. (2022). Improving Imbalanced Learning by Pre-finetuning with Data Augmentation. Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 183:68-82 Available from https://proceedings.mlr.press/v183/shi22a.html.

Related Material