Action-Constrained Markov Decision Processes With Kullback-Leibler Cost

Ana Bušić, Sean Meyn
Proceedings of the 31st Conference On Learning Theory, PMLR 75:1431-1444, 2018.

Abstract

This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step reward function. The approach is new and practical even in the original unconstrained formulation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v75-busic18a, title = {Action-Constrained {Markov Decision Processes} With {Kullback-Leibler} Cost}, author = {Bu\v{s}i\'{c}, Ana and Meyn, Sean}, booktitle = {Proceedings of the 31st Conference On Learning Theory}, pages = {1431--1444}, year = {2018}, editor = {Bubeck, Sébastien and Perchet, Vianney and Rigollet, Philippe}, volume = {75}, series = {Proceedings of Machine Learning Research}, month = {06--09 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v75/busic18a/busic18a.pdf}, url = {https://proceedings.mlr.press/v75/busic18a.html}, abstract = {This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step reward function. The approach is new and practical even in the original unconstrained formulation.} }
Endnote
%0 Conference Paper %T Action-Constrained Markov Decision Processes With Kullback-Leibler Cost %A Ana Bušić %A Sean Meyn %B Proceedings of the 31st Conference On Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Sébastien Bubeck %E Vianney Perchet %E Philippe Rigollet %F pmlr-v75-busic18a %I PMLR %P 1431--1444 %U https://proceedings.mlr.press/v75/busic18a.html %V 75 %X This paper concerns computation of optimal policies in which the one-step reward function contains a cost term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-reward optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step reward function. The approach is new and practical even in the original unconstrained formulation.
APA
Bušić, A. & Meyn, S.. (2018). Action-Constrained Markov Decision Processes With Kullback-Leibler Cost. Proceedings of the 31st Conference On Learning Theory, in Proceedings of Machine Learning Research 75:1431-1444 Available from https://proceedings.mlr.press/v75/busic18a.html.

Related Material