Primal-Dual Stochastic Mirror Descent for MDPs

Daniil Tiapkin; Alexander Gasnikov

Primal-Dual Stochastic Mirror Descent for MDPs

Daniil Tiapkin, Alexander Gasnikov

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:9723-9740, 2022.

Abstract

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for mixing average-reward MDPs with a generative model without reduction to discounted MDP. One of the main features of the presented method is low communication costs in a distributed centralized setting, even with very large networks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-tiapkin22a,
  title = 	 { Primal-Dual Stochastic Mirror Descent for MDPs },
  author =       {Tiapkin, Daniil and Gasnikov, Alexander},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {9723--9740},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/tiapkin22a/tiapkin22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/tiapkin22a.html},
  abstract = 	 { We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for mixing average-reward MDPs with a generative model without reduction to discounted MDP. One of the main features of the presented method is low communication costs in a distributed centralized setting, even with very large networks. }
}

Endnote

%0 Conference Paper
%T  Primal-Dual Stochastic Mirror Descent for MDPs 
%A Daniil Tiapkin
%A Alexander Gasnikov
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-tiapkin22a
%I PMLR
%P 9723--9740
%U https://proceedings.mlr.press/v151/tiapkin22a.html
%V 151
%X  We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for mixing average-reward MDPs with a generative model without reduction to discounted MDP. One of the main features of the presented method is low communication costs in a distributed centralized setting, even with very large networks.

APA


Tiapkin, D. & Gasnikov, A.. (2022).  Primal-Dual Stochastic Mirror Descent for MDPs . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:9723-9740 Available from https://proceedings.mlr.press/v151/tiapkin22a.html.

Primal-Dual Stochastic Mirror Descent for MDPs

Abstract

Cite this Paper

Related Material