[edit]
Fundamental Limits of Distributed Covariance Matrix Estimation Under Communication Constraints
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:41927-41958, 2024.
Abstract
Estimating high-dimensional covariance matrices is crucial in various domains. This work considers a scenario where two collaborating agents access disjoint dimensions of $m$ samples from a high–dimensional random vector, and they can only communicate a limited number of bits to a central server, which wants to accurately approximate the covariance matrix. We analyze the fundamental trade–off between communication cost, number of samples, and estimation accuracy. We prove a lower bound on the error achievable by any estimator, highlighting the impact of dimensions, number of samples, and communication budget. Furthermore, we present an algorithm that achieves this lower bound up to a logarithmic factor, demonstrating its near-optimality in practical settings.