On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness

Murat A Erdogdu, Rasa Hosseinzadeh
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:1776-1822, 2021.

Abstract

We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm. For any potential function $f$ whose tails behave like $\|x\|^\alpha$ for ${\alpha \in [1,2]}$, and has $\beta$-Hölder continuous gradient, we prove that $\widetilde{\mathcal{O}} \Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha}-{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)$ steps are sufficient to reach the $\epsilon$-neighborhood of a $d$-dimensional target distribution $\nu_*$ in KL-divergence. This bound, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$. One notable consequence of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$, the above rate recovers the best known rate $\widetilde{\mathcal{O}} (d\epsilon^{-1})$ which was established for strongly convex potentials in terms of $\epsilon$ dependency, but we show that the same rate is achievable for a wider class of potentials that are degenerately convex at infinity. The growth rate $\alpha$ affects the rate estimate in high dimensions where $d$ is large; furthermore, it recovers the best-known dimension dependency when the tail growth of the potential is quadratic, i.e. $\alpha = 2$, in the current setup.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-erdogdu21a, title = {On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness}, author = {Erdogdu, Murat A and Hosseinzadeh, Rasa}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {1776--1822}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/erdogdu21a/erdogdu21a.pdf}, url = {https://proceedings.mlr.press/v134/erdogdu21a.html}, abstract = {We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm. For any potential function $f$ whose tails behave like $\|x\|^\alpha$ for ${\alpha \in [1,2]}$, and has $\beta$-Hölder continuous gradient, we prove that $\widetilde{\mathcal{O}} \Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha}-{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)$ steps are sufficient to reach the $\epsilon$-neighborhood of a $d$-dimensional target distribution $\nu_*$ in KL-divergence. This bound, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$. One notable consequence of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$, the above rate recovers the best known rate $\widetilde{\mathcal{O}} (d\epsilon^{-1})$ which was established for strongly convex potentials in terms of $\epsilon$ dependency, but we show that the same rate is achievable for a wider class of potentials that are degenerately convex at infinity. The growth rate $\alpha$ affects the rate estimate in high dimensions where $d$ is large; furthermore, it recovers the best-known dimension dependency when the tail growth of the potential is quadratic, i.e. $\alpha = 2$, in the current setup.} }
Endnote
%0 Conference Paper %T On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness %A Murat A Erdogdu %A Rasa Hosseinzadeh %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-erdogdu21a %I PMLR %P 1776--1822 %U https://proceedings.mlr.press/v134/erdogdu21a.html %V 134 %X We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm. For any potential function $f$ whose tails behave like $\|x\|^\alpha$ for ${\alpha \in [1,2]}$, and has $\beta$-Hölder continuous gradient, we prove that $\widetilde{\mathcal{O}} \Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha}-{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)$ steps are sufficient to reach the $\epsilon$-neighborhood of a $d$-dimensional target distribution $\nu_*$ in KL-divergence. This bound, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$. One notable consequence of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$, the above rate recovers the best known rate $\widetilde{\mathcal{O}} (d\epsilon^{-1})$ which was established for strongly convex potentials in terms of $\epsilon$ dependency, but we show that the same rate is achievable for a wider class of potentials that are degenerately convex at infinity. The growth rate $\alpha$ affects the rate estimate in high dimensions where $d$ is large; furthermore, it recovers the best-known dimension dependency when the tail growth of the potential is quadratic, i.e. $\alpha = 2$, in the current setup.
APA
Erdogdu, M.A. & Hosseinzadeh, R.. (2021). On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:1776-1822 Available from https://proceedings.mlr.press/v134/erdogdu21a.html.

Related Material