Learning Tree Structures from Noisy Data

[edit]

Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate ;
Proceedings of Machine Learning Research, PMLR 89:1771-1782, 2019.

Abstract

We provide high-probability sample complexity guarantees for exact structure recovery of tree-structured graphical models, when only noisy observations of the respective vertex emissions are available. We assume that the hidden variables follow either an Ising model or a Gaussian graphical model, and the observables are noise-corrupted versions of the hidden variables: We consider multiplicative $\pm 1$ binary noise for Ising models, and additive Gaussian noise for Gaussian models. Such hidden models arise naturally in a variety of applications such as physics, biology, computer science, and finance. We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. In particular, for a tree with $p$ vertices and probability of failure $\delta>0$, we show that the number of necessary samples for exact structure recovery is of the order of $\mc{O}(\log(p/\delta))$ for Ising models (which remains the \textit{same as in the noiseless case}), and $\mc{O}(\mathrm{polylog}{(p/\delta)})$ for Gaussian models.

Related Material