Detecting Correlations with Little Memory and Communication

Yuval Dagan, Ohad Shamir
Proceedings of the 31st Conference On Learning Theory, PMLR 75:1145-1198, 2018.

Abstract

We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines. We prove a tight trade-off between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required memory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication trade-offs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new information-theoretic result, which may be relevant for studying other information-constrained learning problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-dagan18a, title = {Detecting Correlations with Little Memory and Communication}, author = {Dagan, Yuval and Shamir, Ohad}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {1145--1198}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/dagan18a/dagan18a.pdf}, url = {https://proceedings.mlr.press/v75/dagan18a.html}, abstract = {We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines. We prove a tight trade-off between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required memory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication trade-offs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new information-theoretic result, which may be relevant for studying other information-constrained learning problems.} }
Endnote
%0 Conference Paper %T Detecting Correlations with Little Memory and Communication %A Yuval Dagan %A Ohad Shamir %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-dagan18a %I PMLR %P 1145--1198 %U https://proceedings.mlr.press/v75/dagan18a.html %V 75 %X We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines. We prove a tight trade-off between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required memory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication trade-offs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new information-theoretic result, which may be relevant for studying other information-constrained learning problems.
APA
Dagan, Y. & Shamir, O.. (2018). Detecting Correlations with Little Memory and Communication. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1145-1198 Available from https://proceedings.mlr.press/v75/dagan18a.html.

Related Material