Batch Stationary Distribution Estimation

Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10203-10213, 2020.

Abstract

We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions. Classical simulation-based approaches assume access to the underlying process so that trajectories of sufficient length can be gathered to approximate stationary sampling. Instead, we consider an alternative setting where a \emph{fixed} set of transitions has been collected beforehand, by a separate, possibly unknown procedure. The goal is still to estimate properties of the stationary distribution, but without additional access to the underlying system. We propose a consistent estimator that is based on recovering a correction ratio function over the given data. In particular, we develop a variational power method (VPM) that provides provably consistent estimates under general conditions. In addition to unifying a number of existing approaches from different subfields, we also find that VPM yields significantly better estimates across a range of problems, including queueing, stochastic differential equations, post-processing MCMC, and off-policy evaluation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wen20a, title = {Batch Stationary Distribution Estimation}, author = {Wen, Junfeng and Dai, Bo and Li, Lihong and Schuurmans, Dale}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10203--10213}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wen20a/wen20a.pdf}, url = {https://proceedings.mlr.press/v119/wen20a.html}, abstract = {We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions. Classical simulation-based approaches assume access to the underlying process so that trajectories of sufficient length can be gathered to approximate stationary sampling. Instead, we consider an alternative setting where a \emph{fixed} set of transitions has been collected beforehand, by a separate, possibly unknown procedure. The goal is still to estimate properties of the stationary distribution, but without additional access to the underlying system. We propose a consistent estimator that is based on recovering a correction ratio function over the given data. In particular, we develop a variational power method (VPM) that provides provably consistent estimates under general conditions. In addition to unifying a number of existing approaches from different subfields, we also find that VPM yields significantly better estimates across a range of problems, including queueing, stochastic differential equations, post-processing MCMC, and off-policy evaluation.} }
Endnote
%0 Conference Paper %T Batch Stationary Distribution Estimation %A Junfeng Wen %A Bo Dai %A Lihong Li %A Dale Schuurmans %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wen20a %I PMLR %P 10203--10213 %U https://proceedings.mlr.press/v119/wen20a.html %V 119 %X We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions. Classical simulation-based approaches assume access to the underlying process so that trajectories of sufficient length can be gathered to approximate stationary sampling. Instead, we consider an alternative setting where a \emph{fixed} set of transitions has been collected beforehand, by a separate, possibly unknown procedure. The goal is still to estimate properties of the stationary distribution, but without additional access to the underlying system. We propose a consistent estimator that is based on recovering a correction ratio function over the given data. In particular, we develop a variational power method (VPM) that provides provably consistent estimates under general conditions. In addition to unifying a number of existing approaches from different subfields, we also find that VPM yields significantly better estimates across a range of problems, including queueing, stochastic differential equations, post-processing MCMC, and off-policy evaluation.
APA
Wen, J., Dai, B., Li, L. & Schuurmans, D.. (2020). Batch Stationary Distribution Estimation. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10203-10213 Available from https://proceedings.mlr.press/v119/wen20a.html.

Related Material