Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation

Yu-Chun Kao, Min Xu, Cun-Hui Zhang
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:2795-2839, 2024.

Abstract

When we have a univariate distribution that is symmetric around its mean, the mean can be estimated with a rate (sample complexity) much faster than $O(1/\sqrt{n})$ in many cases. For example, given univariate random variables $Y_1, \ldots, Y_n$ distributed uniformly on $[\theta_0 - c, \theta_0 + c]$, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ maximizes likelihood and has expected error $\mathbb{E}\bigl| \theta_0 - \frac{Y_{(n)}+Y_{(1)}}{2} \bigr| \leq 2c/n$, which is optimal and much lower than the error rate $O(1/\sqrt{n})$ of the sample mean. What the optimal rate is depends on the distribution and it is generally attained by the maximum likelihood estimator (MLE). However, MLE requires exact knowledge of the underlying distribution; if the underlying distribution is \emph{unknown}, it is an open question whether an estimator can adapt to the optimal rate. In this paper, we propose an estimator of the symmetric mean $\theta_0$ with the following properties: it requires no knowledge of the underlying distribution; it has a rate no worse than $1/\sqrt{n}$ in all cases (assuming a finite second moment) and, when the underlying distribution is compactly supported, our estimator can attain a rate of $n^{-\frac{1}{{\alpha}}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by minimizing the $L_\gamma$-loss with respect to the data, for a power $\gamma \geq 2$ chosen in a data-driven way – by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $L_\gamma$ loss function with a data-driven $\gamma$ in certain settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-kao24a, title = {Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation}, author = {Kao, Yu-Chun and Xu, Min and Zhang, Cun-Hui}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {2795--2839}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/kao24a/kao24a.pdf}, url = {https://proceedings.mlr.press/v247/kao24a.html}, abstract = {When we have a univariate distribution that is symmetric around its mean, the mean can be estimated with a rate (sample complexity) much faster than $O(1/\sqrt{n})$ in many cases. For example, given univariate random variables $Y_1, \ldots, Y_n$ distributed uniformly on $[\theta_0 - c, \theta_0 + c]$, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ maximizes likelihood and has expected error $\mathbb{E}\bigl| \theta_0 - \frac{Y_{(n)}+Y_{(1)}}{2} \bigr| \leq 2c/n$, which is optimal and much lower than the error rate $O(1/\sqrt{n})$ of the sample mean. What the optimal rate is depends on the distribution and it is generally attained by the maximum likelihood estimator (MLE). However, MLE requires exact knowledge of the underlying distribution; if the underlying distribution is \emph{unknown}, it is an open question whether an estimator can adapt to the optimal rate. In this paper, we propose an estimator of the symmetric mean $\theta_0$ with the following properties: it requires no knowledge of the underlying distribution; it has a rate no worse than $1/\sqrt{n}$ in all cases (assuming a finite second moment) and, when the underlying distribution is compactly supported, our estimator can attain a rate of $n^{-\frac{1}{{\alpha}}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by minimizing the $L_\gamma$-loss with respect to the data, for a power $\gamma \geq 2$ chosen in a data-driven way – by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $L_\gamma$ loss function with a data-driven $\gamma$ in certain settings. } }
Endnote
%0 Conference Paper %T Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation %A Yu-Chun Kao %A Min Xu %A Cun-Hui Zhang %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-kao24a %I PMLR %P 2795--2839 %U https://proceedings.mlr.press/v247/kao24a.html %V 247 %X When we have a univariate distribution that is symmetric around its mean, the mean can be estimated with a rate (sample complexity) much faster than $O(1/\sqrt{n})$ in many cases. For example, given univariate random variables $Y_1, \ldots, Y_n$ distributed uniformly on $[\theta_0 - c, \theta_0 + c]$, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ maximizes likelihood and has expected error $\mathbb{E}\bigl| \theta_0 - \frac{Y_{(n)}+Y_{(1)}}{2} \bigr| \leq 2c/n$, which is optimal and much lower than the error rate $O(1/\sqrt{n})$ of the sample mean. What the optimal rate is depends on the distribution and it is generally attained by the maximum likelihood estimator (MLE). However, MLE requires exact knowledge of the underlying distribution; if the underlying distribution is \emph{unknown}, it is an open question whether an estimator can adapt to the optimal rate. In this paper, we propose an estimator of the symmetric mean $\theta_0$ with the following properties: it requires no knowledge of the underlying distribution; it has a rate no worse than $1/\sqrt{n}$ in all cases (assuming a finite second moment) and, when the underlying distribution is compactly supported, our estimator can attain a rate of $n^{-\frac{1}{{\alpha}}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by minimizing the $L_\gamma$-loss with respect to the data, for a power $\gamma \geq 2$ chosen in a data-driven way – by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $L_\gamma$ loss function with a data-driven $\gamma$ in certain settings.
APA
Kao, Y., Xu, M. & Zhang, C.. (2024). Choosing the p in Lp Loss: Adaptive Rates for Symmetric Mean Estimation. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:2795-2839 Available from https://proceedings.mlr.press/v247/kao24a.html.

Related Material