FSDA: Tackling Tail-Event Analysis in Imbalanced Time Series Data with Feature Selection and Data Augmentation

Raphael Krief, Eric Benhamou, Beatrice Guez, Jean-Jacques Ohana, David Saltiel, Rida Laraki, Jamal Atif
Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 241:45-58, 2024.

Abstract

Efficient management of imbalanced time series data is of paramount importance when data located in the tails, particularly extreme values, have a substantial influence on predic- tive outcomes. This paper introduces FSDA (Feature Selection and Data Augmentation), a combined approach of feature selection and data augmentation, to address this issue. FSDA aims to identify the most predictive features for tail data, which may exhibit differ- ent sensitivities compared to the rest of the dataset. Data augmentation, a conventional technique for handling imbalanced data, is employed to enhance the accuracy of machine learning regression methods. Augmented information is strategically incorporated using time-warping and drift methods to maintain the temporal integrity of the data. Empirical evidence based on a use case in financial data reveals that FSDA consistently outperforms feature selection (FS) and data augmentation (DA) methods across all percentiles ranging from 85 to 99, demonstrating its efficacy in managing imbalanced time series data and improving predictive accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v241-krief24a, title = {FSDA: Tackling Tail-Event Analysis in Imbalanced Time Series Data with Feature Selection and Data Augmentation}, author = {Krief, Raphael and Benhamou, Eric and Guez, Beatrice and Ohana, Jean-Jacques and Saltiel, David and Laraki, Rida and Atif, Jamal}, booktitle = {Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {45--58}, year = {2024}, editor = {Moniz, Nuno and Branco, Paula and Torgo, Luis and Japkowicz, Nathalie and Wozniak, Michal and Wang, Shuo}, volume = {241}, series = {Proceedings of Machine Learning Research}, month = {18 Sep}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v241/krief24a/krief24a.pdf}, url = {https://proceedings.mlr.press/v241/krief24a.html}, abstract = {Efficient management of imbalanced time series data is of paramount importance when data located in the tails, particularly extreme values, have a substantial influence on predic- tive outcomes. This paper introduces FSDA (Feature Selection and Data Augmentation), a combined approach of feature selection and data augmentation, to address this issue. FSDA aims to identify the most predictive features for tail data, which may exhibit differ- ent sensitivities compared to the rest of the dataset. Data augmentation, a conventional technique for handling imbalanced data, is employed to enhance the accuracy of machine learning regression methods. Augmented information is strategically incorporated using time-warping and drift methods to maintain the temporal integrity of the data. Empirical evidence based on a use case in financial data reveals that FSDA consistently outperforms feature selection (FS) and data augmentation (DA) methods across all percentiles ranging from 85 to 99, demonstrating its efficacy in managing imbalanced time series data and improving predictive accuracy.} }
Endnote
%0 Conference Paper %T FSDA: Tackling Tail-Event Analysis in Imbalanced Time Series Data with Feature Selection and Data Augmentation %A Raphael Krief %A Eric Benhamou %A Beatrice Guez %A Jean-Jacques Ohana %A David Saltiel %A Rida Laraki %A Jamal Atif %B Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2024 %E Nuno Moniz %E Paula Branco %E Luis Torgo %E Nathalie Japkowicz %E Michal Wozniak %E Shuo Wang %F pmlr-v241-krief24a %I PMLR %P 45--58 %U https://proceedings.mlr.press/v241/krief24a.html %V 241 %X Efficient management of imbalanced time series data is of paramount importance when data located in the tails, particularly extreme values, have a substantial influence on predic- tive outcomes. This paper introduces FSDA (Feature Selection and Data Augmentation), a combined approach of feature selection and data augmentation, to address this issue. FSDA aims to identify the most predictive features for tail data, which may exhibit differ- ent sensitivities compared to the rest of the dataset. Data augmentation, a conventional technique for handling imbalanced data, is employed to enhance the accuracy of machine learning regression methods. Augmented information is strategically incorporated using time-warping and drift methods to maintain the temporal integrity of the data. Empirical evidence based on a use case in financial data reveals that FSDA consistently outperforms feature selection (FS) and data augmentation (DA) methods across all percentiles ranging from 85 to 99, demonstrating its efficacy in managing imbalanced time series data and improving predictive accuracy.
APA
Krief, R., Benhamou, E., Guez, B., Ohana, J., Saltiel, D., Laraki, R. & Atif, J.. (2024). FSDA: Tackling Tail-Event Analysis in Imbalanced Time Series Data with Feature Selection and Data Augmentation. Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 241:45-58 Available from https://proceedings.mlr.press/v241/krief24a.html.

Related Material