Quality assessment of nonlinear dimensionality reduction based on K-ary neighborhoods


John Lee, Michel Verleysen ;
Proceedings of the Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery at ECML/PKDD 2008, PMLR 4:21-35, 2008.


Nonlinear dimensionality reduction aims at providing low-dimensional representions of high-dimensional data sets. Many new methods have been recently proposed, but the question of their assessment and comparison remains open. This paper reviews some of the existing quality measures that are based on distance ranking and K-ary neighborhoods. In this context, the comparison of the ranks in the high- and low-dimensional spaces leads to the definition of the co-ranking matrix. Rank errors and concepts such as neighborhood intrusions and extrusions can be associated with different blocks of the co-ranking matrix. The considered quality criteria are then cast within this unifying framework and the blocks they involve are identified. The same framework allows us to propose simpler criteria, which quantify two aspects of the embedding, namely its overall quality and its tendency to favor either intrusions or extrusions. Eventually, a simple experiment illustrates the soundness of the approach.

Related Material