[edit]
Mixing Time of the Proximal Sampler in Relative Fisher Information via Strong Data Processing Inequality (Extended Abstract)
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:5716-5717, 2025.
Abstract
We study sampling from a probability distribution $\nu \propto e^{-f}$ on $\mathbb{R}^d$, from the perspective of minimizing relative entropy (KL divergence) $H_\nu(\rho)$ on the space of probability distributions with the Wasserstein geometry. The Langevin dynamics is a continuous-time stochastic process in $\mathbb{R}^d$ that implements the Wasserstein gradient flow for minimizing $H_\nu$. The relative Fisher information has the geometric meaning as the squared Wasserstein gradient of relative entropy. When $\nu$ is strongly log-concave, relative entropy $H_\nu$ is a strongly convex function, and Langevin dynamics has fast convergence guarantee in relative Fisher information. In discrete time, we study the Proximal Sampler, a two-step Gibbs sampling algorithm to sample from an auxiliary joint distribution which has the original target distribution as the $x$-marginal. The Proximal Sampler can be seen as an approximate proximal discretization of the Langevin dynamics, and it has matching convergence rates with the continuous-time Langevin dynamics in many settings, for example an exponential convergence rate in KL divergence under log-Sobolev inequality. In this work, we show that when $\nu$ is $\alpha$-strongly log-concave, Proximal Sampler also has an exponential convergence in relative Fisher information. We conclude a high-iteration complexity guarantee of the Proximal Sampler in relative Fisher information when the target is strongly log-concave and log-smooth. Our analysis proceeds via establishing the strong data processing inequality (SDPI) for a family of Fokker-Planck channels driven by diffusion processes, including the Gaussian channel, the Ornstein-Uhlenbeck (OU) channel, the Langevin dynamics, and the reverse Gaussian channel. We show that even along the Gaussian channel, data processing inequality in relative Fisher information may not hold when the second distribution is arbitrary. We also show along the Gaussian channel, (S)DPI in relative Fisher information holds when the second distribution is (strongly) log-concave; we also show SDPI in relative Fisher information eventually holds when the second distribution is a log-Lipschitz perturbation of a strongly log-concave distribution. Along the Ornstein-Uhlenbeck channel, we show that SDPI in relative Fisher information eventually holds when the second distribution is strongly log-concave, and exhibit an example where DPI initially does not hold even when both input distributions are Gaussian. For our algorithmic result, we can write the Proximal Sampler as a composition of the Gaussian and reverse Gaussian channels. Then we can combine the SDPI for the Gaussian channel under SLC and the DPI for the reverse Gaussian channel to show that relative Fisher information converges exponentially fast along the Proximal Sampler.