Multi-Source Feature Selection via Geometry-Dependent Covariance Analysis


Zheng Zhao, Huan Liu ;
Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008, PMLR 4:36-47, 2008.


Feature selection is an effective approach to reducing dimensionality by selecting relevant original features. In this work, we studied a novel problem of multi-source feature selection for unlabeled data: given multiple heterogeneous data sources (or data sets), select features from one source of interest by integrating information from various data sources. In essence, we investigate how we can employ the information contained in multiple data sources to effectively derive intrinsic relationships that can help select more meaningful (or domain relevant) features. We studied how to adjust the covariance matrix of a data set using the geometric structure obtained from multiple data sources, and how to select features of the target source using geometry-dependent covariance. We designed and conducted experiments to systematically compare the proposed approach with representative methods in our attempt to solve the novel problem of multi-source feature selection. The empirical study demonstrated the efficacy and potential of multi-source feature selection.

Related Material