[edit]
Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:2795-2839, 2024.
Abstract
When we have a univariate distribution that is symmetric around its mean, the mean can be estimated with a rate (sample complexity) much faster than $O(1/\sqrt{n})$ in many cases. For example, given univariate random variables $Y_1, \ldots, Y_n$ distributed uniformly on $[\theta_0 - c, \theta_0 + c]$, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ maximizes likelihood and has expected error $\mathbb{E}\bigl| \theta_0 - \frac{Y_{(n)}+Y_{(1)}}{2} \bigr| \leq 2c/n$, which is optimal and much lower than the error rate $O(1/\sqrt{n})$ of the sample mean. What the optimal rate is depends on the distribution and it is generally attained by the maximum likelihood estimator (MLE). However, MLE requires exact knowledge of the underlying distribution; if the underlying distribution is \emph{unknown}, it is an open question whether an estimator can adapt to the optimal rate. In this paper, we propose an estimator of the symmetric mean $\theta_0$ with the following properties: it requires no knowledge of the underlying distribution; it has a rate no worse than $1/\sqrt{n}$ in all cases (assuming a finite second moment) and, when the underlying distribution is compactly supported, our estimator can attain a rate of $n^{-\frac{1}{{\alpha}}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by minimizing the $L_\gamma$-loss with respect to the data, for a power $\gamma \geq 2$ chosen in a data-driven way – by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $L_\gamma$ loss function with a data-driven $\gamma$ in certain settings.