Nonlinear Dimensionality Reduction as Information Retrieval


Jarkko Venna, Samuel Kaski ;
Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, PMLR 2:572-579, 2007.


Nonlinear dimensionality reduction has so far been treated either as a data representation problem or as a search for a lowerdimensional manifold embedded in the data space. A main application for both is in information visualization, to make visible the neighborhood or proximity relationships in the data, but neither approach has been designed to optimize this task. We give such visualization a new conceptualization as an information retrieval problem; a projection is good if neighbors of data points can be retrieved well based on the visualized projected points. This makes it possible to rigorously quantify goodness in terms of precision and recall. A method is introduced to optimize retrieval quality; it turns out to be an extension of Stochastic Neighbor Embedding, one of the earlier nonlinear projection methods, for which we give a new interpretation: it optimizes recall. The new method is shown empirically to outperform existing dimensionality reduction methods.

Related Material