Faster Algorithms for High-Dimensional Robust Covariance Estimation

Yu Cheng; Ilias Diakonikolas; Rong Ge; David P. Woodruff

Faster Algorithms for High-Dimensional Robust Covariance Estimation

Yu Cheng, Ilias Diakonikolas, Rong Ge, David P. Woodruff

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:727-757, 2019.

Abstract

We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given $N = \tilde{\Omega}(d^2/\epsilon^2)$ samples from a $d$-dimensional Gaussian distribution, an $\epsilon$-fraction of which may be arbitrarily corrupted, our algorithm runs in time $\tilde{O}(d^{3.26})/\mathrm{poly}(\epsilon)$ and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes $\tilde{\Omega}(d^{2 \omega})$ when $\epsilon = \Omega(1)$, where $\omega$ is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-cheng19a,
  title = 	 {Faster Algorithms for High-Dimensional Robust Covariance Estimation},
  author =       {Cheng, Yu and Diakonikolas, Ilias and Ge, Rong and Woodruff, David P.},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {727--757},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/cheng19a/cheng19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/cheng19a.html},
  abstract = 	 { We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions.  Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given $N = \tilde{\Omega}(d^2/\epsilon^2)$ samples from a $d$-dimensional Gaussian distribution, an $\epsilon$-fraction of which may be arbitrarily corrupted, our algorithm runs in time $\tilde{O}(d^{3.26})/\mathrm{poly}(\epsilon)$ and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes $\tilde{\Omega}(d^{2 \omega})$ when $\epsilon = \Omega(1)$, where $\omega$ is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques. }
}

Endnote

%0 Conference Paper
%T Faster Algorithms for High-Dimensional Robust Covariance Estimation
%A Yu Cheng
%A Ilias Diakonikolas
%A Rong Ge
%A David P. Woodruff
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-cheng19a
%I PMLR
%P 727--757
%U https://proceedings.mlr.press/v99/cheng19a.html
%V 99
%X  We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions.  Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given $N = \tilde{\Omega}(d^2/\epsilon^2)$ samples from a $d$-dimensional Gaussian distribution, an $\epsilon$-fraction of which may be arbitrarily corrupted, our algorithm runs in time $\tilde{O}(d^{3.26})/\mathrm{poly}(\epsilon)$ and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes $\tilde{\Omega}(d^{2 \omega})$ when $\epsilon = \Omega(1)$, where $\omega$ is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques.

APA


Cheng, Y., Diakonikolas, I., Ge, R. & Woodruff, D.P.. (2019). Faster Algorithms for High-Dimensional Robust Covariance Estimation. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:727-757 Available from https://proceedings.mlr.press/v99/cheng19a.html.

Related Material

Download PDF