The Value Function Polytope in Reinforcement Learning

Robert Dadashi, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans, Marc G. Bellemare
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1486-1495, 2019.

Abstract

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective and introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-dadashi19a, title = {The Value Function Polytope in Reinforcement Learning}, author = {Dadashi, Robert and Taiga, Adrien Ali and Roux, Nicolas Le and Schuurmans, Dale and Bellemare, Marc G.}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1486--1495}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/dadashi19a/dadashi19a.pdf}, url = {https://proceedings.mlr.press/v97/dadashi19a.html}, abstract = {We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective and introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.} }
Endnote
%0 Conference Paper %T The Value Function Polytope in Reinforcement Learning %A Robert Dadashi %A Adrien Ali Taiga %A Nicolas Le Roux %A Dale Schuurmans %A Marc G. Bellemare %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-dadashi19a %I PMLR %P 1486--1495 %U https://proceedings.mlr.press/v97/dadashi19a.html %V 97 %X We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective and introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.
APA
Dadashi, R., Taiga, A.A., Roux, N.L., Schuurmans, D. & Bellemare, M.G.. (2019). The Value Function Polytope in Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1486-1495 Available from https://proceedings.mlr.press/v97/dadashi19a.html.

Related Material