Action-Constrained Markov Decision Processes With Kullback-Leibler Cost

Ana Bušić; Sean Meyn

Action-Constrained Markov Decision Processes With Kullback-Leibler Cost

Ana Bušić, Sean Meyn

Proceedings of the 31st Conference On Learning Theory, PMLR 75:1431-1444, 2018.

Abstract

This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step reward function. The approach is new and practical even in the original unconstrained formulation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v75-busic18a,
  title = 	 {Action-Constrained {Markov Decision Processes} With {Kullback-Leibler} Cost},
  author =       {Bu\v{s}i\'{c}, Ana and Meyn, Sean},
  booktitle = 	 {Proceedings of the 31st  Conference On Learning Theory},
  pages = 	 {1431--1444},
  year = 	 {2018},
  editor = 	 {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe},
  volume = 	 {75},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v75/busic18a/busic18a.pdf},
  url = 	 {https://proceedings.mlr.press/v75/busic18a.html},
  abstract = 	 {This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics.  A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance.  This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step  reward function. The approach is new and practical even in the original unconstrained formulation.}
}

Endnote

%0 Conference Paper
%T Action-Constrained Markov Decision Processes With Kullback-Leibler Cost
%A Ana Bušić
%A Sean Meyn
%B Proceedings of the 31st  Conference On Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Sébastien Bubeck
%E Vianney Perchet
%E Philippe Rigollet	
%F pmlr-v75-busic18a
%I PMLR
%P 1431--1444
%U https://proceedings.mlr.press/v75/busic18a.html
%V 75
%X This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics.  A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance.  This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step  reward function. The approach is new and practical even in the original unconstrained formulation.

APA


Bušić, A. & Meyn, S.. (2018). Action-Constrained Markov Decision Processes With Kullback-Leibler Cost. Proceedings of the 31st  Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1431-1444 Available from https://proceedings.mlr.press/v75/busic18a.html.

Related Material

Download PDF