Continuous-time Model-based Reinforcement Learning

Cagatay Yildiz, Markus Heinonen, Harri Lähdesmäki
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:12009-12018, 2021.

Abstract

Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, and can solve classic control problems in a sample-efficient manner.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-yildiz21a, title = {Continuous-time Model-based Reinforcement Learning}, author = {Yildiz, Cagatay and Heinonen, Markus and L{\"a}hdesm{\"a}ki, Harri}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {12009--12018}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/yildiz21a/yildiz21a.pdf}, url = {https://proceedings.mlr.press/v139/yildiz21a.html}, abstract = {Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, and can solve classic control problems in a sample-efficient manner.} }
Endnote
%0 Conference Paper %T Continuous-time Model-based Reinforcement Learning %A Cagatay Yildiz %A Markus Heinonen %A Harri Lähdesmäki %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-yildiz21a %I PMLR %P 12009--12018 %U https://proceedings.mlr.press/v139/yildiz21a.html %V 139 %X Model-based reinforcement learning (MBRL) approaches rely on discrete-time state transition models whereas physical systems and the vast majority of control tasks operate in continuous-time. To avoid time-discretization approximation of the underlying process, we propose a continuous-time MBRL framework based on a novel actor-critic method. Our approach also infers the unknown state evolution differentials with Bayesian neural ordinary differential equations (ODE) to account for epistemic uncertainty. We implement and test our method on a new ODE-RL suite that explicitly solves continuous-time control systems. Our experiments illustrate that the model is robust against irregular and noisy data, and can solve classic control problems in a sample-efficient manner.
APA
Yildiz, C., Heinonen, M. & Lähdesmäki, H.. (2021). Continuous-time Model-based Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:12009-12018 Available from https://proceedings.mlr.press/v139/yildiz21a.html.

Related Material