Detecting Correlations with Little Memory and Communication
[edit]
Proceedings of the 31st Conference On Learning Theory, PMLR 75:11451198, 2018.
Abstract
We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines. We prove a tight tradeoff between the memory/communication complexity and the sample complexity, implying (for example) that to detect pairwise correlations with optimal sample complexity, the number of required memory/communication bits is at least quadratic in the dimension. Our results substantially improve those of Shamir (2014), which studied a similar question in a much more restricted setting. To the best of our knowledge, these are the first provable sample/memory/communication tradeoffs for a practical estimation problem, using standard distributions, and in the natural regime where the memory/communication budget is larger than the size of a single data point. To derive our theorems, we prove a new informationtheoretic result, which may be relevant for studying other informationconstrained learning problems.
Related Material


