[edit]
FSDA: Tackling Tail-Event Analysis in Imbalanced Time Series Data with Feature Selection and Data Augmentation
Proceedings of the Fifth International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 241:45-58, 2024.
Abstract
Efficient management of imbalanced time series data is of paramount importance when data located in the tails, particularly extreme values, have a substantial influence on predic- tive outcomes. This paper introduces FSDA (Feature Selection and Data Augmentation), a combined approach of feature selection and data augmentation, to address this issue. FSDA aims to identify the most predictive features for tail data, which may exhibit differ- ent sensitivities compared to the rest of the dataset. Data augmentation, a conventional technique for handling imbalanced data, is employed to enhance the accuracy of machine learning regression methods. Augmented information is strategically incorporated using time-warping and drift methods to maintain the temporal integrity of the data. Empirical evidence based on a use case in financial data reveals that FSDA consistently outperforms feature selection (FS) and data augmentation (DA) methods across all percentiles ranging from 85 to 99, demonstrating its efficacy in managing imbalanced time series data and improving predictive accuracy.