Streaming Algorithms for High-Dimensional Robust Statistics

Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:5061-5117, 2022.

Abstract

We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust statistics tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber’s contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-diakonikolas22a, title = {Streaming Algorithms for High-Dimensional Robust Statistics}, author = {Diakonikolas, Ilias and Kane, Daniel M. and Pensia, Ankit and Pittas, Thanasis}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {5061--5117}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/diakonikolas22a/diakonikolas22a.pdf}, url = {https://proceedings.mlr.press/v162/diakonikolas22a.html}, abstract = {We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust statistics tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber’s contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.} }
Endnote
%0 Conference Paper %T Streaming Algorithms for High-Dimensional Robust Statistics %A Ilias Diakonikolas %A Daniel M. Kane %A Ankit Pensia %A Thanasis Pittas %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-diakonikolas22a %I PMLR %P 5061--5117 %U https://proceedings.mlr.press/v162/diakonikolas22a.html %V 162 %X We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust statistics tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber’s contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.
APA
Diakonikolas, I., Kane, D.M., Pensia, A. & Pittas, T.. (2022). Streaming Algorithms for High-Dimensional Robust Statistics. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:5061-5117 Available from https://proceedings.mlr.press/v162/diakonikolas22a.html.

Related Material