Comparing Genetic Algorithms and Principal Component Analysis in Reducing Feature Dimensionality

Aiman Naeem, Muhammad Farhan Khan, Saeid Rezaei, Adeel Iqbal, Muhammad Sohail, Munsif Jatoi, Atif Shakeel
Reliable and Trustworthy Artificial Intelligence 2025, PMLR 310:45-58, 2025.

Abstract

It is important to do dimensionality reduction or feature selection so that machine learning models can be built in an efficient and interpretable way, especially with high-dimensional datasets. Using data up to October 2023, this study compares two methods, namely, Ge- netic Algorithm (GA) and Principal Component Analysis (PCA), to assess their usefulness for dimensionality reduction while preserving predictive performance. As a case study, we applied the feature selection methodology on the Breast Cancer Wisconsin (Diagnostic) dataset, comprised of 30 real-valued features that describe the characteristics of cell nuclei. PCA used to reduce dimensional space of dataset by explaining 95% of variance and GA is used to make a minimal subset of subset of relevant features based on fitness function. To evaluate the effect of the reduced dimensionality on classification accuracy, a Random Forest classifier was used. Experimental results shown that GA selected features provide accuracy from the GA by 98.25% and PCA accuracy from the PCA with 93.86% which at the cost of high computational cost. Finally, visualizations illustrated the variance re- tained by PCA, how features provided importance in model performance using GA, and how both influence the models. The quantitative results of this study can be used to iden- tify the trade-off between the statistical approaches and heuristic approaches showing what needs to be prioritized in terms of application specificity when searching for dimensionality reduction methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v310-naeem25a, title = {Comparing Genetic Algorithms and Principal Component Analysis in Reducing Feature Dimensionality}, author = {Naeem, Aiman and Khan, Muhammad Farhan and Rezaei, Saeid and Iqbal, Adeel and Sohail, Muhammad and Jatoi, Munsif and Shakeel, Atif}, booktitle = {Reliable and Trustworthy Artificial Intelligence 2025}, pages = {45--58}, year = {2025}, editor = {Nguyen, Hoang D. and Le, Duc-Trong and Björklund, Johanna and Vu, Xuan-Son}, volume = {310}, series = {Proceedings of Machine Learning Research}, month = {12 Dec}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v310/main/assets/naeem25a/naeem25a.pdf}, url = {https://proceedings.mlr.press/v310/naeem25a.html}, abstract = {It is important to do dimensionality reduction or feature selection so that machine learning models can be built in an efficient and interpretable way, especially with high-dimensional datasets. Using data up to October 2023, this study compares two methods, namely, Ge- netic Algorithm (GA) and Principal Component Analysis (PCA), to assess their usefulness for dimensionality reduction while preserving predictive performance. As a case study, we applied the feature selection methodology on the Breast Cancer Wisconsin (Diagnostic) dataset, comprised of 30 real-valued features that describe the characteristics of cell nuclei. PCA used to reduce dimensional space of dataset by explaining 95% of variance and GA is used to make a minimal subset of subset of relevant features based on fitness function. To evaluate the effect of the reduced dimensionality on classification accuracy, a Random Forest classifier was used. Experimental results shown that GA selected features provide accuracy from the GA by 98.25% and PCA accuracy from the PCA with 93.86% which at the cost of high computational cost. Finally, visualizations illustrated the variance re- tained by PCA, how features provided importance in model performance using GA, and how both influence the models. The quantitative results of this study can be used to iden- tify the trade-off between the statistical approaches and heuristic approaches showing what needs to be prioritized in terms of application specificity when searching for dimensionality reduction methods.} }
Endnote
%0 Conference Paper %T Comparing Genetic Algorithms and Principal Component Analysis in Reducing Feature Dimensionality %A Aiman Naeem %A Muhammad Farhan Khan %A Saeid Rezaei %A Adeel Iqbal %A Muhammad Sohail %A Munsif Jatoi %A Atif Shakeel %B Reliable and Trustworthy Artificial Intelligence 2025 %C Proceedings of Machine Learning Research %D 2025 %E Hoang D. Nguyen %E Duc-Trong Le %E Johanna Björklund %E Xuan-Son Vu %F pmlr-v310-naeem25a %I PMLR %P 45--58 %U https://proceedings.mlr.press/v310/naeem25a.html %V 310 %X It is important to do dimensionality reduction or feature selection so that machine learning models can be built in an efficient and interpretable way, especially with high-dimensional datasets. Using data up to October 2023, this study compares two methods, namely, Ge- netic Algorithm (GA) and Principal Component Analysis (PCA), to assess their usefulness for dimensionality reduction while preserving predictive performance. As a case study, we applied the feature selection methodology on the Breast Cancer Wisconsin (Diagnostic) dataset, comprised of 30 real-valued features that describe the characteristics of cell nuclei. PCA used to reduce dimensional space of dataset by explaining 95% of variance and GA is used to make a minimal subset of subset of relevant features based on fitness function. To evaluate the effect of the reduced dimensionality on classification accuracy, a Random Forest classifier was used. Experimental results shown that GA selected features provide accuracy from the GA by 98.25% and PCA accuracy from the PCA with 93.86% which at the cost of high computational cost. Finally, visualizations illustrated the variance re- tained by PCA, how features provided importance in model performance using GA, and how both influence the models. The quantitative results of this study can be used to iden- tify the trade-off between the statistical approaches and heuristic approaches showing what needs to be prioritized in terms of application specificity when searching for dimensionality reduction methods.
APA
Naeem, A., Khan, M.F., Rezaei, S., Iqbal, A., Sohail, M., Jatoi, M. & Shakeel, A.. (2025). Comparing Genetic Algorithms and Principal Component Analysis in Reducing Feature Dimensionality. Reliable and Trustworthy Artificial Intelligence 2025, in Proceedings of Machine Learning Research 310:45-58 Available from https://proceedings.mlr.press/v310/naeem25a.html.

Related Material