Algorithms and Hardness for Robust Subspace Recovery

Moritz Hardt; Ankur Moitra

Algorithms and Hardness for Robust Subspace Recovery

Moritz Hardt, Ankur Moitra

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:354-375, 2013.

Abstract

We consider a fundamental problem in unsupervised learning called subspace recovery: given a collection of m points in R^n, if many but not necessarily all of these points are contained in a d-dimensional subspace T can we find it? The points contained in T are called inliers and the remaining points are outliers. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to adversarial outliers, and the estimators from robust statistics are hard to compute in high dimensions. This is a serious and persistent issue not just in this application, but for many other problems in unsupervised learning. Are there algorithms for subspace recovery that are both robust to outliers and efficient? We give an algorithm that finds T when it contains more than a d/n fraction of the points. Hence, for say d = n/2 this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is small set expansion hard to find T when the fraction of errors is any larger and so our estimator is an optimal compromise between efficiency and robustness. In fact, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Hardt13,
  title = 	 {Algorithms and Hardness for Robust Subspace Recovery},
  author = 	 {Hardt, Moritz and Moitra, Ankur},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {354--375},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Hardt13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Hardt13.html},
  abstract = 	 {We consider a fundamental problem in unsupervised learning called subspace recovery: given a collection of m points in R^n, if many but not necessarily all of these points are contained in a d-dimensional subspace T can we find it? The points contained in T are called  inliers and the remaining points are  outliers. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to  adversarial outliers, and the estimators from robust statistics are hard to compute in high dimensions. This is a serious and persistent issue not just in this application, but for many other problems in unsupervised learning. Are there algorithms for subspace recovery that are both robust to outliers and efficient?  We give an algorithm that finds T when it contains more than a d/n fraction of the points.  Hence, for say d = n/2 this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is small set expansion hard to find T when the fraction of errors is any larger and so our estimator is an  optimal compromise between efficiency and robustness. In fact, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.}
}

Endnote

%0 Conference Paper
%T Algorithms and Hardness for Robust Subspace Recovery
%A Moritz Hardt
%A Ankur Moitra
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Hardt13
%I PMLR
%P 354--375
%U https://proceedings.mlr.press/v30/Hardt13.html
%V 30
%X We consider a fundamental problem in unsupervised learning called subspace recovery: given a collection of m points in R^n, if many but not necessarily all of these points are contained in a d-dimensional subspace T can we find it? The points contained in T are called  inliers and the remaining points are  outliers. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to  adversarial outliers, and the estimators from robust statistics are hard to compute in high dimensions. This is a serious and persistent issue not just in this application, but for many other problems in unsupervised learning. Are there algorithms for subspace recovery that are both robust to outliers and efficient?  We give an algorithm that finds T when it contains more than a d/n fraction of the points.  Hence, for say d = n/2 this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is small set expansion hard to find T when the fraction of errors is any larger and so our estimator is an  optimal compromise between efficiency and robustness. In fact, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.

RIS


TY  - CPAPER
TI  - Algorithms and Hardness for Robust Subspace Recovery
AU  - Moritz Hardt
AU  - Ankur Moitra
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Hardt13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 354
EP  - 375
L1  - http://proceedings.mlr.press/v30/Hardt13.pdf
UR  - https://proceedings.mlr.press/v30/Hardt13.html
AB  - We consider a fundamental problem in unsupervised learning called subspace recovery: given a collection of m points in R^n, if many but not necessarily all of these points are contained in a d-dimensional subspace T can we find it? The points contained in T are called  inliers and the remaining points are  outliers. This problem has received considerable attention in computer science and in statistics. Yet efficient algorithms from computer science are not robust to  adversarial outliers, and the estimators from robust statistics are hard to compute in high dimensions. This is a serious and persistent issue not just in this application, but for many other problems in unsupervised learning. Are there algorithms for subspace recovery that are both robust to outliers and efficient?  We give an algorithm that finds T when it contains more than a d/n fraction of the points.  Hence, for say d = n/2 this estimator is both easy to compute and well-behaved when there are a constant fraction of outliers. We prove that it is small set expansion hard to find T when the fraction of errors is any larger and so our estimator is an  optimal compromise between efficiency and robustness. In fact, this basic problem has a surprising number of connections to other areas including small set expansion, matroid theory and functional analysis that we make use of here.
ER  -

APA


Hardt, M. & Moitra, A.. (2013). Algorithms and Hardness for Robust Subspace Recovery. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:354-375 Available from https://proceedings.mlr.press/v30/Hardt13.html.

Related Material

Download PDF