Reinforcement Learning for Adaptive MCMC

Congye Wang; Wilson Ye Chen; Heishiro Kanagawa; Chris J. Oates

Reinforcement Learning for Adaptive MCMC

Congye Wang, Wilson Ye Chen, Heishiro Kanagawa, Chris J. Oates

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:640-648, 2025.

Abstract

An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called \emph{Reinforcement Learning Metropolis—Hastings}, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis—Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis–Hastings algorithm on $\approx$90% of tasks in the \emph{PosteriorDB} benchmark.

Cite this Paper

BibTeX

@InProceedings{pmlr-v258-wang25b,
  title = 	 {Reinforcement Learning for Adaptive MCMC},
  author =       {Wang, Congye and Chen, Wilson Ye and Kanagawa, Heishiro and Oates, Chris J.},
  booktitle = 	 {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {640--648},
  year = 	 {2025},
  editor = 	 {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz},
  volume = 	 {258},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--05 May},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v258/main/assets/wang25b/wang25b.pdf},
  url = 	 {https://proceedings.mlr.press/v258/wang25b.html},
  abstract = 	 {An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task.  Yet, to-date it has remained unclear how to exploit modern reinforcement learning technologies for adaptive MCMC.  The aim of this paper is to set out a general framework, called \emph{Reinforcement Learning Metropolis—Hastings}, that is theoretically supported and empirically validated.  Our principal focus is on learning fast-mixing Metropolis—Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient.  Control of the learning rate provably ensures conditions for ergodicity are satisfied.  The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis–Hastings algorithm on $\approx$90% of tasks in the \emph{PosteriorDB} benchmark.}
}

Endnote

%0 Conference Paper
%T Reinforcement Learning for Adaptive MCMC
%A Congye Wang
%A Wilson Ye Chen
%A Heishiro Kanagawa
%A Chris J. Oates
%B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2025
%E Yingzhen Li
%E Stephan Mandt
%E Shipra Agrawal
%E Emtiyaz Khan	
%F pmlr-v258-wang25b
%I PMLR
%P 640--648
%U https://proceedings.mlr.press/v258/wang25b.html
%V 258
%X An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task.  Yet, to-date it has remained unclear how to exploit modern reinforcement learning technologies for adaptive MCMC.  The aim of this paper is to set out a general framework, called \emph{Reinforcement Learning Metropolis—Hastings}, that is theoretically supported and empirically validated.  Our principal focus is on learning fast-mixing Metropolis—Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient.  Control of the learning rate provably ensures conditions for ergodicity are satisfied.  The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis–Hastings algorithm on $\approx$90% of tasks in the \emph{PosteriorDB} benchmark.

APA

Wang, C., Chen, W.Y., Kanagawa, H. & Oates, C.J.. (2025). Reinforcement Learning for Adaptive MCMC. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:640-648 Available from https://proceedings.mlr.press/v258/wang25b.html.

Reinforcement Learning for Adaptive MCMC

Abstract

Cite this Paper

Related Material