Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

Prateek Jain, Sewoong Oh
Proceedings of The 27th Conference on Learning Theory, PMLR 35:824-856, 2014.

Abstract

We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have sample/time complexity that is exponential in the number of mixtures. In this paper, we introduce a polynomial time/sample complexity method for learning a mixture of r discrete product distributions over {1, 2, …, \ell}^n, for general \ell and r. We show that our approach is consistent and further provide finite sample guarantees. We use recently developed techniques from tensor decompositions for moment matching. A crucial step in these approaches is to construct certain tensors with low-rank spectral decompositions. These tensors are typically estimated from the sample moments. The main challenge in learning mixtures of discrete product distributions is that the corresponding low-rank tensors cannot be obtained directly from the sample moments. Instead, we need to estimate a low-rank matrix using only off-diagonal entries, and estimate a tensor using a few linear measurements. We give an alternating minimization based method to estimate the low-rank matrix, and formulate the tensor estimation problem as a least-squares problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v35-jain14, title = {Learning Mixtures of Discrete Product Distributions using Spectral Decompositions}, author = {Jain, Prateek and Oh, Sewoong}, booktitle = {Proceedings of The 27th Conference on Learning Theory}, pages = {824--856}, year = {2014}, editor = {Balcan, Maria Florina and Feldman, Vitaly and Szepesvári, Csaba}, volume = {35}, series = {Proceedings of Machine Learning Research}, address = {Barcelona, Spain}, month = {13--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v35/jain14.pdf}, url = {https://proceedings.mlr.press/v35/jain14.html}, abstract = {We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have sample/time complexity that is exponential in the number of mixtures. In this paper, we introduce a polynomial time/sample complexity method for learning a mixture of r discrete product distributions over {1, 2, …, \ell}^n, for general \ell and r. We show that our approach is consistent and further provide finite sample guarantees. We use recently developed techniques from tensor decompositions for moment matching. A crucial step in these approaches is to construct certain tensors with low-rank spectral decompositions. These tensors are typically estimated from the sample moments. The main challenge in learning mixtures of discrete product distributions is that the corresponding low-rank tensors cannot be obtained directly from the sample moments. Instead, we need to estimate a low-rank matrix using only off-diagonal entries, and estimate a tensor using a few linear measurements. We give an alternating minimization based method to estimate the low-rank matrix, and formulate the tensor estimation problem as a least-squares problem. } }
Endnote
%0 Conference Paper %T Learning Mixtures of Discrete Product Distributions using Spectral Decompositions %A Prateek Jain %A Sewoong Oh %B Proceedings of The 27th Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2014 %E Maria Florina Balcan %E Vitaly Feldman %E Csaba Szepesvári %F pmlr-v35-jain14 %I PMLR %P 824--856 %U https://proceedings.mlr.press/v35/jain14.html %V 35 %X We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have sample/time complexity that is exponential in the number of mixtures. In this paper, we introduce a polynomial time/sample complexity method for learning a mixture of r discrete product distributions over {1, 2, …, \ell}^n, for general \ell and r. We show that our approach is consistent and further provide finite sample guarantees. We use recently developed techniques from tensor decompositions for moment matching. A crucial step in these approaches is to construct certain tensors with low-rank spectral decompositions. These tensors are typically estimated from the sample moments. The main challenge in learning mixtures of discrete product distributions is that the corresponding low-rank tensors cannot be obtained directly from the sample moments. Instead, we need to estimate a low-rank matrix using only off-diagonal entries, and estimate a tensor using a few linear measurements. We give an alternating minimization based method to estimate the low-rank matrix, and formulate the tensor estimation problem as a least-squares problem.
RIS
TY - CPAPER TI - Learning Mixtures of Discrete Product Distributions using Spectral Decompositions AU - Prateek Jain AU - Sewoong Oh BT - Proceedings of The 27th Conference on Learning Theory DA - 2014/05/29 ED - Maria Florina Balcan ED - Vitaly Feldman ED - Csaba Szepesvári ID - pmlr-v35-jain14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 35 SP - 824 EP - 856 L1 - http://proceedings.mlr.press/v35/jain14.pdf UR - https://proceedings.mlr.press/v35/jain14.html AB - We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have sample/time complexity that is exponential in the number of mixtures. In this paper, we introduce a polynomial time/sample complexity method for learning a mixture of r discrete product distributions over {1, 2, …, \ell}^n, for general \ell and r. We show that our approach is consistent and further provide finite sample guarantees. We use recently developed techniques from tensor decompositions for moment matching. A crucial step in these approaches is to construct certain tensors with low-rank spectral decompositions. These tensors are typically estimated from the sample moments. The main challenge in learning mixtures of discrete product distributions is that the corresponding low-rank tensors cannot be obtained directly from the sample moments. Instead, we need to estimate a low-rank matrix using only off-diagonal entries, and estimate a tensor using a few linear measurements. We give an alternating minimization based method to estimate the low-rank matrix, and formulate the tensor estimation problem as a least-squares problem. ER -
APA
Jain, P. & Oh, S.. (2014). Learning Mixtures of Discrete Product Distributions using Spectral Decompositions. Proceedings of The 27th Conference on Learning Theory, in Proceedings of Machine Learning Research 35:824-856 Available from https://proceedings.mlr.press/v35/jain14.html.

Related Material