[edit]

# Robust Linear Regression for General Feature Distribution

*Proceedings of The 26th International Conference on Artificial Intelligence and Statistics*, PMLR 206:2405-2435, 2023.

#### Abstract

We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary that knows the data distribution but is otherwise oblivious to the realization of the data samples. This model has been previously analyzed under strong assumptions. Concretely, (i) all previous works assume that the covariance matrix of the features is positive definite; (ii) most of them assume that the features are centered. Additionally, all previous works make additional restrictive assumptions, e.g., assuming Gaussianity of the features or symmetric distribution of the corruptions. In this work, we investigate robust regression under a more general set of assumptions: (i) the covariance matrix may be either positive definite or positive semi definite, (ii) features may not be centered, (iii) no assumptions beyond boundedness (or sub-Gaussianity) of the features and the measurement noise. Under these assumptions we analyze a sequential algorithm, namely, a natural SGD variant for this problem, and show that it enjoys a fast convergence rate when the covariance matrix is positive definite. In the positive semi definite case we show that there are two regimes: if the features are centered, we can obtain a standard convergence rate; Otherwise, the adversary can cause any learner to fail arbitrarily.