Sparsity on Statistical Simplexes and Diversity in Social Ranking

Ke Sun, Hisham Mohamed, Stephane Marchand-Maillet
Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:16-31, 2015.

Abstract

Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v39-sun14, title = {Sparsity on Statistical Simplexes and Diversity in Social Ranking}, author = {Sun, Ke and Mohamed, Hisham and Marchand-Maillet, Stephane}, booktitle = {Proceedings of the Sixth Asian Conference on Machine Learning}, pages = {16--31}, year = {2015}, editor = {Phung, Dinh and Li, Hang}, volume = {39}, series = {Proceedings of Machine Learning Research}, address = {Nha Trang City, Vietnam}, month = {26--28 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v39/sun14.pdf}, url = {https://proceedings.mlr.press/v39/sun14.html}, abstract = {Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.} }
Endnote
%0 Conference Paper %T Sparsity on Statistical Simplexes and Diversity in Social Ranking %A Ke Sun %A Hisham Mohamed %A Stephane Marchand-Maillet %B Proceedings of the Sixth Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2015 %E Dinh Phung %E Hang Li %F pmlr-v39-sun14 %I PMLR %P 16--31 %U https://proceedings.mlr.press/v39/sun14.html %V 39 %X Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.
RIS
TY - CPAPER TI - Sparsity on Statistical Simplexes and Diversity in Social Ranking AU - Ke Sun AU - Hisham Mohamed AU - Stephane Marchand-Maillet BT - Proceedings of the Sixth Asian Conference on Machine Learning DA - 2015/02/16 ED - Dinh Phung ED - Hang Li ID - pmlr-v39-sun14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 39 SP - 16 EP - 31 L1 - http://proceedings.mlr.press/v39/sun14.pdf UR - https://proceedings.mlr.press/v39/sun14.html AB - Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques. ER -
APA
Sun, K., Mohamed, H. & Marchand-Maillet, S.. (2015). Sparsity on Statistical Simplexes and Diversity in Social Ranking. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:16-31 Available from https://proceedings.mlr.press/v39/sun14.html.

Related Material