Best Agglomerative Ranked Subset for Feature Selection


Roberto Ruiz, José C. Riquelme, Jesús S. Aguilar-Ruiz ;
Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008, PMLR 4:148-162, 2008.


The enormous increase of the size in databases makes finding an optimal subset of features extremely difficult. In this paper, a new feature selection method is proposed that will allow any subset evaluator -including the wrapper evaluation method- to be used to find a group of features that will allow a distinction to be made between the different possible classes. The method, BARS (Best Agglomerative Ranked Subset), is based on the idea of relevance and redundancy, in the sense that a ranked feature (or set) is more relevant if it adds information when it is included in the final subset of selected features. This heuristic method reduces dimensionality drastically and leads to improvements in the accuracy, in comparison to a complete set and as opposed to other feature selection algorithms.

Related Material