Optimal score estimation via empirical Bayes smoothing

Andre Wibisono, Yihong Wu, Kaylee Yingxi Yang
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4958-4991, 2024.

Abstract

We study the problem of estimating the score function of an unknown probability distribution ρ from n independent and identically distributed observations in d dimensions. Assuming that ρ is subgaussian and has a Lipschitz-continuous score function s, we establish the optimal rate of ˜Θ(n2d+4) for this estimation problem under the loss function that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension d. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating \beta-Hölder continuous scores with \beta \leq 1, as well as the implication of our theory on the sample complexity of score-based generative models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-wibisono24a, title = {Optimal score estimation via empirical Bayes smoothing}, author = {Wibisono, Andre and Wu, Yihong and Yang, Kaylee Yingxi}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {4958--4991}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/wibisono24a/wibisono24a.pdf}, url = {https://proceedings.mlr.press/v247/wibisono24a.html}, abstract = {We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $\rho^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde \Theta(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(\rho^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating $\beta$-Hölder continuous scores with $\beta \leq 1$, as well as the implication of our theory on the sample complexity of score-based generative models.} }
Endnote
%0 Conference Paper %T Optimal score estimation via empirical Bayes smoothing %A Andre Wibisono %A Yihong Wu %A Kaylee Yingxi Yang %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-wibisono24a %I PMLR %P 4958--4991 %U https://proceedings.mlr.press/v247/wibisono24a.html %V 247 %X We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $\rho^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde \Theta(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(\rho^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating $\beta$-Hölder continuous scores with $\beta \leq 1$, as well as the implication of our theory on the sample complexity of score-based generative models.
APA
Wibisono, A., Wu, Y. & Yang, K.Y.. (2024). Optimal score estimation via empirical Bayes smoothing. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4958-4991 Available from https://proceedings.mlr.press/v247/wibisono24a.html.

Related Material