Learning Tree Structures from Noisy Data

Konstantinos E. Nikolakakis, Dionysios S. Kalogerias, Anand D. Sarwate
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1771-1782, 2019.

Abstract

We provide high-probability sample complexity guarantees for exact structure recovery of tree-structured graphical models, when only noisy observations of the respective vertex emissions are available. We assume that the hidden variables follow either an Ising model or a Gaussian graphical model, and the observables are noise-corrupted versions of the hidden variables: We consider multiplicative $\pm 1$ binary noise for Ising models, and additive Gaussian noise for Gaussian models. Such hidden models arise naturally in a variety of applications such as physics, biology, computer science, and finance. We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. In particular, for a tree with $p$ vertices and probability of failure $\delta>0$, we show that the number of necessary samples for exact structure recovery is of the order of $\mc{O}(\log(p/\delta))$ for Ising models (which remains the \textit{same as in the noiseless case}), and $\mc{O}(\mathrm{polylog}{(p/\delta)})$ for Gaussian models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-nikolakakis19a, title = {Learning Tree Structures from Noisy Data}, author = {Nikolakakis, Konstantinos E. and Kalogerias, Dionysios S. and Sarwate, Anand D.}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1771--1782}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/nikolakakis19a/nikolakakis19a.pdf}, url = {https://proceedings.mlr.press/v89/nikolakakis19a.html}, abstract = {We provide high-probability sample complexity guarantees for exact structure recovery of tree-structured graphical models, when only noisy observations of the respective vertex emissions are available. We assume that the hidden variables follow either an Ising model or a Gaussian graphical model, and the observables are noise-corrupted versions of the hidden variables: We consider multiplicative $\pm 1$ binary noise for Ising models, and additive Gaussian noise for Gaussian models. Such hidden models arise naturally in a variety of applications such as physics, biology, computer science, and finance. We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. In particular, for a tree with $p$ vertices and probability of failure $\delta>0$, we show that the number of necessary samples for exact structure recovery is of the order of $\mc{O}(\log(p/\delta))$ for Ising models (which remains the \textit{same as in the noiseless case}), and $\mc{O}(\mathrm{polylog}{(p/\delta)})$ for Gaussian models.} }
Endnote
%0 Conference Paper %T Learning Tree Structures from Noisy Data %A Konstantinos E. Nikolakakis %A Dionysios S. Kalogerias %A Anand D. Sarwate %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-nikolakakis19a %I PMLR %P 1771--1782 %U https://proceedings.mlr.press/v89/nikolakakis19a.html %V 89 %X We provide high-probability sample complexity guarantees for exact structure recovery of tree-structured graphical models, when only noisy observations of the respective vertex emissions are available. We assume that the hidden variables follow either an Ising model or a Gaussian graphical model, and the observables are noise-corrupted versions of the hidden variables: We consider multiplicative $\pm 1$ binary noise for Ising models, and additive Gaussian noise for Gaussian models. Such hidden models arise naturally in a variety of applications such as physics, biology, computer science, and finance. We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. In particular, for a tree with $p$ vertices and probability of failure $\delta>0$, we show that the number of necessary samples for exact structure recovery is of the order of $\mc{O}(\log(p/\delta))$ for Ising models (which remains the \textit{same as in the noiseless case}), and $\mc{O}(\mathrm{polylog}{(p/\delta)})$ for Gaussian models.
APA
Nikolakakis, K.E., Kalogerias, D.S. & Sarwate, A.D.. (2019). Learning Tree Structures from Noisy Data. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1771-1782 Available from https://proceedings.mlr.press/v89/nikolakakis19a.html.

Related Material