[edit]

# Principal Subspace Estimation Under Information Diffusion

*Proceedings of The 24th International Conference on Artificial Intelligence and Statistics*, PMLR 130:3205-3213, 2021.

#### Abstract

Let $\mathbf{A} = \mathbf{L}_0 + \mathbf{S}_0$, where $\mathbf{L}_0 \in \mathbb{R}^{d\times d}$ is low rank and $\mathbf{S}_0$ is a perturbation matrix. We study the principal subspace estimation of $\mathbf{L}_0$ through observations $\mathbf{y}_j = f(\mathbf{A})\mathbf{x}_j$, $j=1,…,n$, where $f:\mathbb{R}\rightarrow \mathbb{R}$ is an unknown polynomial and $\mathbf{x}_j$’s are i.i.d. random input signals. Such models are widely used in graph signal processing to model information diffusion dynamics over networks with applications in network topology inference and data analysis. We develop an estimation procedure based on nuclear norm penalization, and establish upper bounds on the principal subspace estimation error when $\mathbf{A}$ is the adjacency matrix of a random graph generated by $\mathbf{L}_0$. Our theory shows that when the signal strength is strong enough, the exact rank of $\mathbf{L}_0$ can be recovered. By applying our results to blind community detection, we show that consistency of spectral clustering can be achieved for some popular stochastic block models. Together with the experimental results, our theory show that there is a fundamental limit of using the principal components obtained from diffused graph signals which is commonly adapted in current practice. Finally, under some structured perturbation $\mathbf{S}_0$, we build the connection between this model with spiked covariance model and develop a new estimation procedure. We show that such estimators can be optimal under the minimax paradigm.