Streaming Coresets for Symmetric Tensor Factorization

Rachit Chhaya, Jayesh Choudhari, Anirban Dasgupta, Supratim Shit
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1855-1865, 2020.

Abstract

Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present four algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices (2-ordered tensor), our online row sampling algorithm guarantees $(1 \pm \epsilon)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-chhaya20a, title = {Streaming Coresets for Symmetric Tensor Factorization}, author = {Chhaya, Rachit and Choudhari, Jayesh and Dasgupta, Anirban and Shit, Supratim}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {1855--1865}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/chhaya20a/chhaya20a.pdf}, url = { http://proceedings.mlr.press/v119/chhaya20a.html }, abstract = {Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present four algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices (2-ordered tensor), our online row sampling algorithm guarantees $(1 \pm \epsilon)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling.} }
Endnote
%0 Conference Paper %T Streaming Coresets for Symmetric Tensor Factorization %A Rachit Chhaya %A Jayesh Choudhari %A Anirban Dasgupta %A Supratim Shit %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-chhaya20a %I PMLR %P 1855--1865 %U http://proceedings.mlr.press/v119/chhaya20a.html %V 119 %X Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present four algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices (2-ordered tensor), our online row sampling algorithm guarantees $(1 \pm \epsilon)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling.
APA
Chhaya, R., Choudhari, J., Dasgupta, A. & Shit, S.. (2020). Streaming Coresets for Symmetric Tensor Factorization. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1855-1865 Available from http://proceedings.mlr.press/v119/chhaya20a.html .

Related Material