Efficient Algorithms for Outlier-Robust Regression

Adam Klivans, Pravesh K. Kothari, Raghu Meka
Proceedings of the 31st Conference On Learning Theory, PMLR 75:1420-1430, 2018.

Abstract

We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution ${\mathcal{D}}$ and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to ${\mathcal{D}}$, assuming that the marginal distribution of $\mathcal{D}$ over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-klivans18a, title = {Efficient Algorithms for Outlier-Robust Regression}, author = {Klivans, Adam and Kothari, Pravesh K. and Meka, Raghu}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {1420--1430}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/klivans18a/klivans18a.pdf}, url = {https://proceedings.mlr.press/v75/klivans18a.html}, abstract = { We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution ${\mathcal{D}}$ and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to ${\mathcal{D}}$, assuming that the marginal distribution of $\mathcal{D}$ over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets. } }
Endnote
%0 Conference Paper %T Efficient Algorithms for Outlier-Robust Regression %A Adam Klivans %A Pravesh K. Kothari %A Raghu Meka %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-klivans18a %I PMLR %P 1420--1430 %U https://proceedings.mlr.press/v75/klivans18a.html %V 75 %X We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution ${\mathcal{D}}$ and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to ${\mathcal{D}}$, assuming that the marginal distribution of $\mathcal{D}$ over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.
APA
Klivans, A., Kothari, P.K. & Meka, R.. (2018). Efficient Algorithms for Outlier-Robust Regression. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1420-1430 Available from https://proceedings.mlr.press/v75/klivans18a.html.

Related Material