On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness

Murat A Erdogdu; Rasa Hosseinzadeh

On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness

Murat A Erdogdu, Rasa Hosseinzadeh

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:1776-1822, 2021.

Abstract

We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm. For any potential function $f$ whose tails behave like $\|x\|^\alpha$ for ${\alpha \in [1,2]}$, and has $\beta$-Hölder continuous gradient, we prove that $\widetilde{\mathcal{O}} \Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha}-{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)$ steps are sufficient to reach the $\epsilon$-neighborhood of a $d$-dimensional target distribution $\nu_*$ in KL-divergence. This bound, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$. One notable consequence of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$, the above rate recovers the best known rate $\widetilde{\mathcal{O}} (d\epsilon^{-1})$ which was established for strongly convex potentials in terms of $\epsilon$ dependency, but we show that the same rate is achievable for a wider class of potentials that are degenerately convex at infinity. The growth rate $\alpha$ affects the rate estimate in high dimensions where $d$ is large; furthermore, it recovers the best-known dimension dependency when the tail growth of the potential is quadratic, i.e. $\alpha = 2$, in the current setup.

Cite this Paper

BibTeX


@InProceedings{pmlr-v134-erdogdu21a,
  title = 	 {On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness},
  author =       {Erdogdu, Murat A and Hosseinzadeh, Rasa},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {1776--1822},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/erdogdu21a/erdogdu21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/erdogdu21a.html},
  abstract = 	 {We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm.  For any potential function $f$ whose tails behave like $\|x\|^\alpha$ for ${\alpha \in [1,2]}$, and has $\beta$-Hölder continuous gradient, we prove that $\widetilde{\mathcal{O}} \Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha}-{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)$ steps are sufficient to reach the $\epsilon$-neighborhood of a $d$-dimensional target distribution $\nu_*$ in KL-divergence.  This bound, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$.  One notable consequence of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$, the above rate recovers the best known rate $\widetilde{\mathcal{O}} (d\epsilon^{-1})$ which was established for strongly convex potentials in terms of $\epsilon$ dependency, but we show that the same rate is achievable for a wider class of potentials that are degenerately convex at infinity.  The growth rate $\alpha$ affects the rate estimate in high dimensions where $d$ is large; furthermore, it recovers the best-known dimension dependency when the tail growth of the potential is quadratic, i.e. $\alpha = 2$, in the current setup.}
}

Endnote

%0 Conference Paper
%T On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness
%A Murat A Erdogdu
%A Rasa Hosseinzadeh
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-erdogdu21a
%I PMLR
%P 1776--1822
%U https://proceedings.mlr.press/v134/erdogdu21a.html
%V 134
%X We study sampling from a target distribution $\nu_* = e^{-f}$ using the unadjusted Langevin Monte Carlo (LMC) algorithm.  For any potential function $f$ whose tails behave like $\|x\|^\alpha$ for ${\alpha \in [1,2]}$, and has $\beta$-Hölder continuous gradient, we prove that $\widetilde{\mathcal{O}} \Big(d^{\frac{1}{\beta}+\frac{1+\beta}{\beta}(\frac{2}{\alpha}-{1}_{\{\alpha \neq 1\}})} \epsilon^{-\frac{1}{\beta}}\Big)$ steps are sufficient to reach the $\epsilon$-neighborhood of a $d$-dimensional target distribution $\nu_*$ in KL-divergence.  This bound, in terms of $\epsilon$ dependency, is not directly influenced by the tail growth rate $\alpha$ of the potential function as long as its growth is at least linear, and it only relies on the order of smoothness $\beta$.  One notable consequence of this result is that for potentials with Lipschitz gradient, i.e. $\beta=1$, the above rate recovers the best known rate $\widetilde{\mathcal{O}} (d\epsilon^{-1})$ which was established for strongly convex potentials in terms of $\epsilon$ dependency, but we show that the same rate is achievable for a wider class of potentials that are degenerately convex at infinity.  The growth rate $\alpha$ affects the rate estimate in high dimensions where $d$ is large; furthermore, it recovers the best-known dimension dependency when the tail growth of the potential is quadratic, i.e. $\alpha = 2$, in the current setup.

APA


Erdogdu, M.A. & Hosseinzadeh, R.. (2021). On the Convergence of Langevin Monte Carlo: The Interplay between Tail Growth and Smoothness. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:1776-1822 Available from https://proceedings.mlr.press/v134/erdogdu21a.html.

Related Material

Download PDF