Structured Logconcave Sampling with a Restricted Gaussian Oracle

Yin Tat Lee; Ruoqi Shen; Kevin Tian

Structured Logconcave Sampling with a Restricted Gaussian Oracle

Yin Tat Lee, Ruoqi Shen, Kevin Tian

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2993-3050, 2021.

Abstract

We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to generically improve dependences on problem conditioning $\kappa$ from polynomial to linear. A key ingredient in our framework is the notion of a "restricted Gaussian oracle" (RGO) for $g: \mathbb{R}^d \rightarrow \mathbb{R}$, which is a sampler for distributions whose negative log-likelihood sums a quadratic (in a multiple of the identity) and $g$. By combining our reduction framework with our new samplers, we obtain the following bounds for sampling structured distributions to total variation distance $\eps$. For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\tfrac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound Lee et.\ al.\. No composite samplers with better mixing than general-purpose logconcave samplers were previously known. For logconcave finite sums $\exp(-F(x))$, where $F(x) = \tfrac{1}{n}\sum_{i \in [n]} f_i(x)$ has condition number $\kappa$, we give a sampler querying $\widetilde{O}(n + \kappa\max(d, \sqrt{nd}))$ gradient oracles\footnote{For convenience of exposition, the $\widetilde{O}$ notation hides logarithmic factors in the dimension $d$, problem conditioning $\kappa$, desired accuracy $\epsilon$, and summand count $n$ (when applicable). A first-order (gradient) oracle for $f:\mathbb{R}^d \rightarrow \mathbb{R}$ returns $(f(x), \nabla f(x))$ on input $x$, and a zeroth-order (value) oracle only returns $f(x)$.} to $\{f_i\}_{i \in [n]}$. No high-accuracy samplers with nontrivial gradient query complexity were previously known. For densities with condition number $\kappa$, we give an algorithm obtaining mixing time $O(\kappa d \log^2\tfrac{\kappa d}{\epsilon})$, improving Lee et.\ al.\ by a logarithmic factor with a significantly simpler analysis. We also show a zeroth-order algorithm attains the same query complexity.

Cite this Paper

BibTeX

@InProceedings{pmlr-v134-lee21a,
  title = 	 {Structured Logconcave Sampling with a Restricted Gaussian Oracle},
  author =       {Lee, Yin Tat and Shen, Ruoqi and Tian, Kevin},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {2993--3050},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/lee21a/lee21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/lee21a.html},
  abstract = 	 {We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to generically improve dependences on problem conditioning $\kappa$ from polynomial to linear. A key ingredient in our framework is the notion of a "restricted Gaussian oracle" (RGO) for $g: \mathbb{R}^d \rightarrow \mathbb{R}$, which is a sampler for distributions whose negative log-likelihood sums a quadratic (in a multiple of the identity) and $g$. By combining our reduction framework with our new samplers, we obtain the following bounds for sampling structured distributions to total variation distance $\eps$.
For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\tfrac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound Lee et.\ al.\. No composite samplers with better mixing than general-purpose logconcave samplers were previously known.
For logconcave finite sums $\exp(-F(x))$, where $F(x) = \tfrac{1}{n}\sum_{i \in [n]} f_i(x)$ has condition number $\kappa$, we give a sampler querying $\widetilde{O}(n + \kappa\max(d, \sqrt{nd}))$ gradient oracles\footnote{For convenience of exposition, the $\widetilde{O}$ notation hides logarithmic factors in the dimension $d$, problem conditioning $\kappa$, desired accuracy $\epsilon$, and summand count $n$ (when applicable). A first-order (gradient) oracle for $f:\mathbb{R}^d \rightarrow \mathbb{R}$ returns $(f(x), \nabla f(x))$ on input $x$, and a zeroth-order (value) oracle only returns $f(x)$.} to $\{f_i\}_{i \in [n]}$. No high-accuracy samplers with nontrivial gradient query complexity were previously known.
For densities with condition number $\kappa$, we give an algorithm obtaining mixing time $O(\kappa d \log^2\tfrac{\kappa d}{\epsilon})$, improving Lee et.\ al.\ by a logarithmic factor with a significantly simpler analysis. We also show a zeroth-order algorithm attains the same query complexity.}
}

Endnote

%0 Conference Paper
%T Structured Logconcave Sampling with a Restricted Gaussian Oracle
%A Yin Tat Lee
%A Ruoqi Shen
%A Kevin Tian
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-lee21a
%I PMLR
%P 2993--3050
%U https://proceedings.mlr.press/v134/lee21a.html
%V 134
%X We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to generically improve dependences on problem conditioning $\kappa$ from polynomial to linear. A key ingredient in our framework is the notion of a "restricted Gaussian oracle" (RGO) for $g: \mathbb{R}^d \rightarrow \mathbb{R}$, which is a sampler for distributions whose negative log-likelihood sums a quadratic (in a multiple of the identity) and $g$. By combining our reduction framework with our new samplers, we obtain the following bounds for sampling structured distributions to total variation distance $\eps$.
For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\tfrac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound Lee et.\ al.\. No composite samplers with better mixing than general-purpose logconcave samplers were previously known.
For logconcave finite sums $\exp(-F(x))$, where $F(x) = \tfrac{1}{n}\sum_{i \in [n]} f_i(x)$ has condition number $\kappa$, we give a sampler querying $\widetilde{O}(n + \kappa\max(d, \sqrt{nd}))$ gradient oracles\footnote{For convenience of exposition, the $\widetilde{O}$ notation hides logarithmic factors in the dimension $d$, problem conditioning $\kappa$, desired accuracy $\epsilon$, and summand count $n$ (when applicable). A first-order (gradient) oracle for $f:\mathbb{R}^d \rightarrow \mathbb{R}$ returns $(f(x), \nabla f(x))$ on input $x$, and a zeroth-order (value) oracle only returns $f(x)$.} to $\{f_i\}_{i \in [n]}$. No high-accuracy samplers with nontrivial gradient query complexity were previously known.
For densities with condition number $\kappa$, we give an algorithm obtaining mixing time $O(\kappa d \log^2\tfrac{\kappa d}{\epsilon})$, improving Lee et.\ al.\ by a logarithmic factor with a significantly simpler analysis. We also show a zeroth-order algorithm attains the same query complexity.

APA

Lee, Y.T., Shen, R. & Tian, K.. (2021). Structured Logconcave Sampling with a Restricted Gaussian Oracle. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:2993-3050 Available from https://proceedings.mlr.press/v134/lee21a.html.

Related Material

Download PDF