Sparsity on Statistical Simplexes and Diversity in Social Ranking

Ke Sun; Hisham Mohamed; Stephane Marchand-Maillet

Sparsity on Statistical Simplexes and Diversity in Social Ranking

Ke Sun, Hisham Mohamed, Stephane Marchand-Maillet

Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:16-31, 2015.

Abstract

Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.

Cite this Paper

BibTeX


@InProceedings{pmlr-v39-sun14,
  title = 	 {Sparsity on Statistical Simplexes and Diversity in Social Ranking},
  author = 	 {Sun, Ke and Mohamed, Hisham and Marchand-Maillet, Stephane},
  booktitle = 	 {Proceedings of the Sixth Asian Conference on Machine Learning},
  pages = 	 {16--31},
  year = 	 {2015},
  editor = 	 {Phung, Dinh and Li, Hang},
  volume = 	 {39},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Nha Trang City, Vietnam},
  month = 	 {26--28 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v39/sun14.pdf},
  url = 	 {https://proceedings.mlr.press/v39/sun14.html},
  abstract = 	 {Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.}
}

Endnote

%0 Conference Paper
%T Sparsity on Statistical Simplexes and Diversity in Social Ranking
%A Ke Sun
%A Hisham Mohamed
%A Stephane Marchand-Maillet
%B Proceedings of the Sixth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Dinh Phung
%E Hang Li	
%F pmlr-v39-sun14
%I PMLR
%P 16--31
%U https://proceedings.mlr.press/v39/sun14.html
%V 39
%X Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.

RIS


TY  - CPAPER
TI  - Sparsity on Statistical Simplexes and Diversity in Social Ranking
AU  - Ke Sun
AU  - Hisham Mohamed
AU  - Stephane Marchand-Maillet
BT  - Proceedings of the Sixth Asian Conference on Machine Learning
DA  - 2015/02/16
ED  - Dinh Phung
ED  - Hang Li	
ID  - pmlr-v39-sun14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 39
SP  - 16
EP  - 31
L1  - http://proceedings.mlr.press/v39/sun14.pdf
UR  - https://proceedings.mlr.press/v39/sun14.html
AB  - Sparsity in \Re^m has been widely explored in machine learning. We study sparsity on a statistical simplex consisting of all categorical distributions. This is different from the case in \Re^m because such a simplex is a Riemannian manifold, a curved space. A learner with sparse constraints should be likely to fall to its low-dimensional boundaries. We present a novel analysis on the statistical simplex as a manifold with boundary. The main contribution is an explicit view of the learning dynamics in between high-dimensional models in the interior of the simplex and low-dimensional models on its boundaries. We prove the differentiability of the cost function, the natural gradient with respect to the Riemannian structure, and convexity around the singular regions. We uncover an interesting relationship with L_1 regularization. We apply the proposed technique to social network analysis. Given a directed graph, the task is to rank a subset of influencer nodes. Here, sparsity means that the top-ranked nodes should present diversity in the sense of minimizing influence overlap. We present a ranking algorithm based on the natural gradient. It can scale up to graph datasets with millions of nodes. On real large networks, the top-ranked nodes are the most informative among several commonly-used techniques.
ER  -

APA


Sun, K., Mohamed, H. & Marchand-Maillet, S.. (2015). Sparsity on Statistical Simplexes and Diversity in Social Ranking. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:16-31 Available from https://proceedings.mlr.press/v39/sun14.html.

Related Material

Download PDF