Being Robust (in High Dimensions) Can Be Practical

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, Alistair Stewart
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:999-1008, 2017.

Abstract

Robust estimation is much more challenging in high-dimensions than it is in one-dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-diakonikolas17a, title = {Being Robust (in High Dimensions) Can Be Practical}, author = {Ilias Diakonikolas and Gautam Kamath and Daniel M. Kane and Jerry Li and Ankur Moitra and Alistair Stewart}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {999--1008}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/diakonikolas17a/diakonikolas17a.pdf}, url = {https://proceedings.mlr.press/v70/diakonikolas17a.html}, abstract = {Robust estimation is much more challenging in high-dimensions than it is in one-dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.} }
Endnote
%0 Conference Paper %T Being Robust (in High Dimensions) Can Be Practical %A Ilias Diakonikolas %A Gautam Kamath %A Daniel M. Kane %A Jerry Li %A Ankur Moitra %A Alistair Stewart %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-diakonikolas17a %I PMLR %P 999--1008 %U https://proceedings.mlr.press/v70/diakonikolas17a.html %V 70 %X Robust estimation is much more challenging in high-dimensions than it is in one-dimension: Most techniques either lead to intractable optimization problems or estimators that can tolerate only a tiny fraction of errors. Recent work in theoretical computer science has shown that, in appropriate distributional models, it is possible to robustly estimate the mean and covariance with polynomial time algorithms that can tolerate a constant fraction of corruptions, independent of the dimension. However, the sample and time complexity of these algorithms is prohibitively large for high-dimensional applications. In this work, we address both of these issues by establishing sample complexity bounds that are optimal, up to logarithmic factors, as well as giving various refinements that allow the algorithms to tolerate a much larger fraction of corruptions. Finally, we show on both synthetic and real data that our algorithms have state-of-the-art performance and suddenly make high-dimensional robust estimation a realistic possibility.
APA
Diakonikolas, I., Kamath, G., Kane, D.M., Li, J., Moitra, A. & Stewart, A.. (2017). Being Robust (in High Dimensions) Can Be Practical. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:999-1008 Available from https://proceedings.mlr.press/v70/diakonikolas17a.html.

Related Material