Reinforcement Learning in Configurable Continuous Environments

Alberto Maria Metelli, Emanuele Ghelfi, Marcello Restelli
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4546-4555, 2019.

Abstract

Configurable Markov Decision Processes (Conf-MDPs) have been recently introduced as an extension of the usual MDP model to account for the possibility of configuring the environment to improve the agent’s performance. Currently, there is still no suitable algorithm to solve the learning problem for real-world Conf-MDPs. In this paper, we fill this gap by proposing a trust-region method, Relative Entropy Model Policy Search (REMPS), able to learn both the policy and the MDP configuration in continuous domains without requiring the knowledge of the true model of the environment. After introducing our approach and providing a finite-sample analysis, we empirically evaluate REMPS on both benchmark and realistic environments by comparing our results with those of the gradient methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-metelli19a, title = {Reinforcement Learning in Configurable Continuous Environments}, author = {Metelli, Alberto Maria and Ghelfi, Emanuele and Restelli, Marcello}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4546--4555}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/metelli19a/metelli19a.pdf}, url = {https://proceedings.mlr.press/v97/metelli19a.html}, abstract = {Configurable Markov Decision Processes (Conf-MDPs) have been recently introduced as an extension of the usual MDP model to account for the possibility of configuring the environment to improve the agent’s performance. Currently, there is still no suitable algorithm to solve the learning problem for real-world Conf-MDPs. In this paper, we fill this gap by proposing a trust-region method, Relative Entropy Model Policy Search (REMPS), able to learn both the policy and the MDP configuration in continuous domains without requiring the knowledge of the true model of the environment. After introducing our approach and providing a finite-sample analysis, we empirically evaluate REMPS on both benchmark and realistic environments by comparing our results with those of the gradient methods.} }
Endnote
%0 Conference Paper %T Reinforcement Learning in Configurable Continuous Environments %A Alberto Maria Metelli %A Emanuele Ghelfi %A Marcello Restelli %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-metelli19a %I PMLR %P 4546--4555 %U https://proceedings.mlr.press/v97/metelli19a.html %V 97 %X Configurable Markov Decision Processes (Conf-MDPs) have been recently introduced as an extension of the usual MDP model to account for the possibility of configuring the environment to improve the agent’s performance. Currently, there is still no suitable algorithm to solve the learning problem for real-world Conf-MDPs. In this paper, we fill this gap by proposing a trust-region method, Relative Entropy Model Policy Search (REMPS), able to learn both the policy and the MDP configuration in continuous domains without requiring the knowledge of the true model of the environment. After introducing our approach and providing a finite-sample analysis, we empirically evaluate REMPS on both benchmark and realistic environments by comparing our results with those of the gradient methods.
APA
Metelli, A.M., Ghelfi, E. & Restelli, M.. (2019). Reinforcement Learning in Configurable Continuous Environments. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4546-4555 Available from https://proceedings.mlr.press/v97/metelli19a.html.

Related Material