Robust Sparse Regression under Adversarial Corruption

Yudong Chen, Constantine Caramanis, Shie Mannor
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):774-782, 2013.

Abstract

We consider high dimensional sparse regression with arbitrary – possibly, severe or coordinated – errors in the covariates matrix. We are interested in understanding how many corruptions we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. As we show, neither can the natural brute force algorithm that takes exponential time to find the subset of data and support columns, that yields the smallest regression error. We explore the power of a simple idea: replace the essential linear algebraic calculation – the inner product – with a robust counterpart that cannot be greatly affected by a controlled number of arbitrarily corrupted points: the trimmed inner product. We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-chen13h, title = {Robust Sparse Regression under Adversarial Corruption}, author = {Chen, Yudong and Caramanis, Constantine and Mannor, Shie}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {774--782}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/chen13h.pdf}, url = {https://proceedings.mlr.press/v28/chen13h.html}, abstract = {We consider high dimensional sparse regression with arbitrary – possibly, severe or coordinated – errors in the covariates matrix. We are interested in understanding how many corruptions we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. As we show, neither can the natural brute force algorithm that takes exponential time to find the subset of data and support columns, that yields the smallest regression error. We explore the power of a simple idea: replace the essential linear algebraic calculation – the inner product – with a robust counterpart that cannot be greatly affected by a controlled number of arbitrarily corrupted points: the trimmed inner product. We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust.} }
Endnote
%0 Conference Paper %T Robust Sparse Regression under Adversarial Corruption %A Yudong Chen %A Constantine Caramanis %A Shie Mannor %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-chen13h %I PMLR %P 774--782 %U https://proceedings.mlr.press/v28/chen13h.html %V 28 %N 3 %X We consider high dimensional sparse regression with arbitrary – possibly, severe or coordinated – errors in the covariates matrix. We are interested in understanding how many corruptions we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. As we show, neither can the natural brute force algorithm that takes exponential time to find the subset of data and support columns, that yields the smallest regression error. We explore the power of a simple idea: replace the essential linear algebraic calculation – the inner product – with a robust counterpart that cannot be greatly affected by a controlled number of arbitrarily corrupted points: the trimmed inner product. We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust.
RIS
TY - CPAPER TI - Robust Sparse Regression under Adversarial Corruption AU - Yudong Chen AU - Constantine Caramanis AU - Shie Mannor BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-chen13h PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 774 EP - 782 L1 - http://proceedings.mlr.press/v28/chen13h.pdf UR - https://proceedings.mlr.press/v28/chen13h.html AB - We consider high dimensional sparse regression with arbitrary – possibly, severe or coordinated – errors in the covariates matrix. We are interested in understanding how many corruptions we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. As we show, neither can the natural brute force algorithm that takes exponential time to find the subset of data and support columns, that yields the smallest regression error. We explore the power of a simple idea: replace the essential linear algebraic calculation – the inner product – with a robust counterpart that cannot be greatly affected by a controlled number of arbitrarily corrupted points: the trimmed inner product. We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust. ER -
APA
Chen, Y., Caramanis, C. & Mannor, S.. (2013). Robust Sparse Regression under Adversarial Corruption. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):774-782 Available from https://proceedings.mlr.press/v28/chen13h.html.

Related Material