Learning Mixtures of Linear Regressions with Nearly Optimal Complexity

Yuanzhi Li, Yingyu Liang
Proceedings of the 31st Conference On Learning Theory, PMLR 75:1125-1144, 2018.

Abstract

Mixtures of Linear Regressions (MLR) is an important mixture model with many applications. In this model, each observation is generated from one of the several unknown linear regression components, where the identity of the generated component is also unknown. Previous works either assume strong assumptions on the data distribution or have high complexity. This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. In particular, different from previous works that require the data to be from the standard Gaussian, the algorithm allows the data from Gaussians with different covariances. When the conditional number of the covariances and the number of components are fixed, the algorithm has nearly optimal sample complexity $N = \tilde{O}(d)$ as well as nearly optimal computational complexity $\tilde{O}(Nd)$, where $d$ is the dimension of the data space. To the best of our knowledge, this approach provides the first such recovery guarantee for this general setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-li18b, title = {Learning Mixtures of Linear Regressions with Nearly Optimal Complexity}, author = {Li, Yuanzhi and Liang, Yingyu}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {1125--1144}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/li18b/li18b.pdf}, url = {https://proceedings.mlr.press/v75/li18b.html}, abstract = {Mixtures of Linear Regressions (MLR) is an important mixture model with many applications. In this model, each observation is generated from one of the several unknown linear regression components, where the identity of the generated component is also unknown. Previous works either assume strong assumptions on the data distribution or have high complexity. This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. In particular, different from previous works that require the data to be from the standard Gaussian, the algorithm allows the data from Gaussians with different covariances. When the conditional number of the covariances and the number of components are fixed, the algorithm has nearly optimal sample complexity $N = \tilde{O}(d)$ as well as nearly optimal computational complexity $\tilde{O}(Nd)$, where $d$ is the dimension of the data space. To the best of our knowledge, this approach provides the first such recovery guarantee for this general setting.} }
Endnote
%0 Conference Paper %T Learning Mixtures of Linear Regressions with Nearly Optimal Complexity %A Yuanzhi Li %A Yingyu Liang %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-li18b %I PMLR %P 1125--1144 %U https://proceedings.mlr.press/v75/li18b.html %V 75 %X Mixtures of Linear Regressions (MLR) is an important mixture model with many applications. In this model, each observation is generated from one of the several unknown linear regression components, where the identity of the generated component is also unknown. Previous works either assume strong assumptions on the data distribution or have high complexity. This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. In particular, different from previous works that require the data to be from the standard Gaussian, the algorithm allows the data from Gaussians with different covariances. When the conditional number of the covariances and the number of components are fixed, the algorithm has nearly optimal sample complexity $N = \tilde{O}(d)$ as well as nearly optimal computational complexity $\tilde{O}(Nd)$, where $d$ is the dimension of the data space. To the best of our knowledge, this approach provides the first such recovery guarantee for this general setting.
APA
Li, Y. & Liang, Y.. (2018). Learning Mixtures of Linear Regressions with Nearly Optimal Complexity. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1125-1144 Available from https://proceedings.mlr.press/v75/li18b.html.

Related Material