[edit]
Robust Non-linear Normalization of Heterogeneous Feature Distributions with Adaptive Tanh-Estimators
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:406-414, 2024.
Abstract
Feature normalization is a crucial step in machine learning that scales numerical values to improve model effectiveness. Noisy or impure datasets can pose a challenge for traditional normalization methods as they may contain outliers that violate statistical assumptions, leading to reduced model performance and increased unpredictability. Non-linear Tanh-Estimators (TE) have been found to provide robust feature normalization, but their fixed scaling factor may not be appropriate for all distributions of feature values. This work presents a refinement to the TE that employs the Wasserstein distance to adaptively estimate the optimal scaling factor for each feature individually against a specified target distribution. The results demonstrate that this adaptive approach can outperform the current TE method in the literature in terms of convergence speed by enabling better initial training starts, thus reducing or eliminating the need to re-adjust model weights during early training phases due to inadequately scaled features. Empirical evaluation was done on synthetic data, standard toy computer vision datasets, and a real-world numeric tabular dataset.