Structured Logconcave Sampling with a Restricted Gaussian Oracle

Yin Tat Lee, Ruoqi Shen, Kevin Tian
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2993-3050, 2021.

Abstract

We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to generically improve dependences on problem conditioning $\kappa$ from polynomial to linear. A key ingredient in our framework is the notion of a "restricted Gaussian oracle" (RGO) for $g: \mathbb{R}^d \rightarrow \mathbb{R}$, which is a sampler for distributions whose negative log-likelihood sums a quadratic (in a multiple of the identity) and $g$. By combining our reduction framework with our new samplers, we obtain the following bounds for sampling structured distributions to total variation distance $\eps$. For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\tfrac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound Lee et.\ al.\. No composite samplers with better mixing than general-purpose logconcave samplers were previously known. For logconcave finite sums $\exp(-F(x))$, where $F(x) = \tfrac{1}{n}\sum_{i \in [n]} f_i(x)$ has condition number $\kappa$, we give a sampler querying $\widetilde{O}(n + \kappa\max(d, \sqrt{nd}))$ gradient oracles\footnote{For convenience of exposition, the $\widetilde{O}$ notation hides logarithmic factors in the dimension $d$, problem conditioning $\kappa$, desired accuracy $\epsilon$, and summand count $n$ (when applicable). A first-order (gradient) oracle for $f:\mathbb{R}^d \rightarrow \mathbb{R}$ returns $(f(x), \nabla f(x))$ on input $x$, and a zeroth-order (value) oracle only returns $f(x)$.} to $\{f_i\}_{i \in [n]}$. No high-accuracy samplers with nontrivial gradient query complexity were previously known. For densities with condition number $\kappa$, we give an algorithm obtaining mixing time $O(\kappa d \log^2\tfrac{\kappa d}{\epsilon})$, improving Lee et.\ al.\ by a logarithmic factor with a significantly simpler analysis. We also show a zeroth-order algorithm attains the same query complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-lee21a, title = {Structured Logconcave Sampling with a Restricted Gaussian Oracle}, author = {Lee, Yin Tat and Shen, Ruoqi and Tian, Kevin}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {2993--3050}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/lee21a/lee21a.pdf}, url = {https://proceedings.mlr.press/v134/lee21a.html}, abstract = {We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to generically improve dependences on problem conditioning $\kappa$ from polynomial to linear. A key ingredient in our framework is the notion of a "restricted Gaussian oracle" (RGO) for $g: \mathbb{R}^d \rightarrow \mathbb{R}$, which is a sampler for distributions whose negative log-likelihood sums a quadratic (in a multiple of the identity) and $g$. By combining our reduction framework with our new samplers, we obtain the following bounds for sampling structured distributions to total variation distance $\eps$. For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\tfrac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound Lee et.\ al.\. No composite samplers with better mixing than general-purpose logconcave samplers were previously known. For logconcave finite sums $\exp(-F(x))$, where $F(x) = \tfrac{1}{n}\sum_{i \in [n]} f_i(x)$ has condition number $\kappa$, we give a sampler querying $\widetilde{O}(n + \kappa\max(d, \sqrt{nd}))$ gradient oracles\footnote{For convenience of exposition, the $\widetilde{O}$ notation hides logarithmic factors in the dimension $d$, problem conditioning $\kappa$, desired accuracy $\epsilon$, and summand count $n$ (when applicable). A first-order (gradient) oracle for $f:\mathbb{R}^d \rightarrow \mathbb{R}$ returns $(f(x), \nabla f(x))$ on input $x$, and a zeroth-order (value) oracle only returns $f(x)$.} to $\{f_i\}_{i \in [n]}$. No high-accuracy samplers with nontrivial gradient query complexity were previously known. For densities with condition number $\kappa$, we give an algorithm obtaining mixing time $O(\kappa d \log^2\tfrac{\kappa d}{\epsilon})$, improving Lee et.\ al.\ by a logarithmic factor with a significantly simpler analysis. We also show a zeroth-order algorithm attains the same query complexity.} }
Endnote
%0 Conference Paper %T Structured Logconcave Sampling with a Restricted Gaussian Oracle %A Yin Tat Lee %A Ruoqi Shen %A Kevin Tian %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-lee21a %I PMLR %P 2993--3050 %U https://proceedings.mlr.press/v134/lee21a.html %V 134 %X We give algorithms for sampling several structured logconcave families to high accuracy. We further develop a reduction framework, inspired by proximal point methods in convex optimization, which bootstraps samplers for regularized densities to generically improve dependences on problem conditioning $\kappa$ from polynomial to linear. A key ingredient in our framework is the notion of a "restricted Gaussian oracle" (RGO) for $g: \mathbb{R}^d \rightarrow \mathbb{R}$, which is a sampler for distributions whose negative log-likelihood sums a quadratic (in a multiple of the identity) and $g$. By combining our reduction framework with our new samplers, we obtain the following bounds for sampling structured distributions to total variation distance $\eps$. For composite densities $\exp(-f(x) - g(x))$, where $f$ has condition number $\kappa$ and convex (but possibly non-smooth) $g$ admits an RGO, we obtain a mixing time of $O(\kappa d \log^3\tfrac{\kappa d}{\epsilon})$, matching the state-of-the-art non-composite bound Lee et.\ al.\. No composite samplers with better mixing than general-purpose logconcave samplers were previously known. For logconcave finite sums $\exp(-F(x))$, where $F(x) = \tfrac{1}{n}\sum_{i \in [n]} f_i(x)$ has condition number $\kappa$, we give a sampler querying $\widetilde{O}(n + \kappa\max(d, \sqrt{nd}))$ gradient oracles\footnote{For convenience of exposition, the $\widetilde{O}$ notation hides logarithmic factors in the dimension $d$, problem conditioning $\kappa$, desired accuracy $\epsilon$, and summand count $n$ (when applicable). A first-order (gradient) oracle for $f:\mathbb{R}^d \rightarrow \mathbb{R}$ returns $(f(x), \nabla f(x))$ on input $x$, and a zeroth-order (value) oracle only returns $f(x)$.} to $\{f_i\}_{i \in [n]}$. No high-accuracy samplers with nontrivial gradient query complexity were previously known. For densities with condition number $\kappa$, we give an algorithm obtaining mixing time $O(\kappa d \log^2\tfrac{\kappa d}{\epsilon})$, improving Lee et.\ al.\ by a logarithmic factor with a significantly simpler analysis. We also show a zeroth-order algorithm attains the same query complexity.
APA
Lee, Y.T., Shen, R. & Tian, K.. (2021). Structured Logconcave Sampling with a Restricted Gaussian Oracle. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:2993-3050 Available from https://proceedings.mlr.press/v134/lee21a.html.

Related Material