Data Augmentation for Imbalanced Regression

Samuel Stocksieker, Denys Pommeret, Arthur Charpentier
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7774-7799, 2023.

Abstract

In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-stocksieker23a, title = {Data Augmentation for Imbalanced Regression}, author = {Stocksieker, Samuel and Pommeret, Denys and Charpentier, Arthur}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7774--7799}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/stocksieker23a/stocksieker23a.pdf}, url = {https://proceedings.mlr.press/v206/stocksieker23a.html}, abstract = {In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.} }
Endnote
%0 Conference Paper %T Data Augmentation for Imbalanced Regression %A Samuel Stocksieker %A Denys Pommeret %A Arthur Charpentier %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-stocksieker23a %I PMLR %P 7774--7799 %U https://proceedings.mlr.press/v206/stocksieker23a.html %V 206 %X In this work, we consider the problem of imbalanced data in a regression framework when the imbalanced phenomenon concerns continuous or discrete covariates. Such a situation can lead to biases in the estimates. In this case, we propose a data augmentation algorithm that combines a weighted resampling (WR) and a data augmentation (DA) procedure. In a first step, the DA procedure permits exploring a wider support than the initial one. In a second step, the WR method drives the exogenous distribution to a target one. We discuss the choice of the DA procedure through a numerical study that illustrates the advantages of this approach. Finally, an actuarial application is studied.
APA
Stocksieker, S., Pommeret, D. & Charpentier, A.. (2023). Data Augmentation for Imbalanced Regression. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7774-7799 Available from https://proceedings.mlr.press/v206/stocksieker23a.html.

Related Material