Sparsity on Statistical Simplexes and Diversity in Social Ranking
Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:16-31, 2015.
Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.