[edit]
Comparing Genetic Algorithms and Principal Component Analysis in Reducing Feature Dimensionality
Reliable and Trustworthy Artificial Intelligence 2025, PMLR 310:45-58, 2025.
Abstract
It is important to do dimensionality reduction or feature selection so that machine learning models can be built in an efficient and interpretable way, especially with high-dimensional datasets. Using data up to October 2023, this study compares two methods, namely, Ge- netic Algorithm (GA) and Principal Component Analysis (PCA), to assess their usefulness for dimensionality reduction while preserving predictive performance. As a case study, we applied the feature selection methodology on the Breast Cancer Wisconsin (Diagnostic) dataset, comprised of 30 real-valued features that describe the characteristics of cell nuclei. PCA used to reduce dimensional space of dataset by explaining 95% of variance and GA is used to make a minimal subset of subset of relevant features based on fitness function. To evaluate the effect of the reduced dimensionality on classification accuracy, a Random Forest classifier was used. Experimental results shown that GA selected features provide accuracy from the GA by 98.25% and PCA accuracy from the PCA with 93.86% which at the cost of high computational cost. Finally, visualizations illustrated the variance re- tained by PCA, how features provided importance in model performance using GA, and how both influence the models. The quantitative results of this study can be used to iden- tify the trade-off between the statistical approaches and heuristic approaches showing what needs to be prioritized in terms of application specificity when searching for dimensionality reduction methods.