[edit]

# Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

*Proceedings of the Thirty-Second Conference on Learning Theory*, PMLR 99:2531-2550, 2019.

#### Abstract

In this work we provide an estimator for the covariance matrix of a heavy-tailed multivariate distribution. We prove that the proposed estimator $\widehat{\mathbf{S}}$ admits an \textit{affine-invariant} bound of the form \[ (1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S} \]{in} high probability, where $\mathbf{S}$ is the unknown covariance matrix, and $\preccurlyeq$ is the positive semidefinite order on symmetric matrices. The result only requires the existence of fourth-order moments, and allows for $\varepsilon = O(\sqrt{\kappa^4 d\log(d/\delta)/n})$ where $\kappa^4$ is a measure of kurtosis of the distribution, $d$ is the dimensionality of the space, $n$ is the sample size, and $1-\delta$ is the desired confidence level. More generally, we can allow for regularization with level $\lambda$, then $d$ gets replaced with the degrees of freedom number. Denoting $\text{cond}(\mathbf{S})$ the condition number of $\mathbf{S}$, the computational cost of the novel estimator is $O(d^2 n + d^3\log(\text{cond}(\mathbf{S})))$, which is comparable to the cost of the sample covariance estimator in the statistically interesing regime $n \ge d$. We consider applications of our estimator to eigenvalue estimation with relative error, and to ridge regression with heavy-tailed random design.