Mode-constrained Model-based Reinforcement Learning via Gaussian Processes

Aidan Scannell, Carl Henrik Ek, Arthur Richards
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:3299-3314, 2023.

Abstract

Model-based reinforcement learning (RL) algorithms do not typically consider environments with multiple dynamic modes, where it is beneficial to avoid inoperable or undesirable modes. We present a model-based RL algorithm that constrains training to a single dynamic mode with high probability. This is a difficult problem because the mode constraint is a hidden variable associated with the environment’s dynamics. As such, it is 1) unknown a priori and 2) we do not observe its output from the environment, so cannot learn it with supervised learning. We present a nonparametric dynamic model which learns the mode constraint alongside the dynamic modes. Importantly, it learns latent structure that our planning scheme leverages to 1) enforce the mode constraint with high probability, and 2) escape local optima induced by the mode constraint. We validate our method by showing that it can solve a simulated quadcopter navigation task whilst providing a level of constraint satisfaction both during and after training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-scannell23a, title = {Mode-constrained Model-based Reinforcement Learning via Gaussian Processes}, author = {Scannell, Aidan and Ek, Carl Henrik and Richards, Arthur}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {3299--3314}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/scannell23a/scannell23a.pdf}, url = {https://proceedings.mlr.press/v206/scannell23a.html}, abstract = {Model-based reinforcement learning (RL) algorithms do not typically consider environments with multiple dynamic modes, where it is beneficial to avoid inoperable or undesirable modes. We present a model-based RL algorithm that constrains training to a single dynamic mode with high probability. This is a difficult problem because the mode constraint is a hidden variable associated with the environment’s dynamics. As such, it is 1) unknown a priori and 2) we do not observe its output from the environment, so cannot learn it with supervised learning. We present a nonparametric dynamic model which learns the mode constraint alongside the dynamic modes. Importantly, it learns latent structure that our planning scheme leverages to 1) enforce the mode constraint with high probability, and 2) escape local optima induced by the mode constraint. We validate our method by showing that it can solve a simulated quadcopter navigation task whilst providing a level of constraint satisfaction both during and after training.} }
Endnote
%0 Conference Paper %T Mode-constrained Model-based Reinforcement Learning via Gaussian Processes %A Aidan Scannell %A Carl Henrik Ek %A Arthur Richards %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-scannell23a %I PMLR %P 3299--3314 %U https://proceedings.mlr.press/v206/scannell23a.html %V 206 %X Model-based reinforcement learning (RL) algorithms do not typically consider environments with multiple dynamic modes, where it is beneficial to avoid inoperable or undesirable modes. We present a model-based RL algorithm that constrains training to a single dynamic mode with high probability. This is a difficult problem because the mode constraint is a hidden variable associated with the environment’s dynamics. As such, it is 1) unknown a priori and 2) we do not observe its output from the environment, so cannot learn it with supervised learning. We present a nonparametric dynamic model which learns the mode constraint alongside the dynamic modes. Importantly, it learns latent structure that our planning scheme leverages to 1) enforce the mode constraint with high probability, and 2) escape local optima induced by the mode constraint. We validate our method by showing that it can solve a simulated quadcopter navigation task whilst providing a level of constraint satisfaction both during and after training.
APA
Scannell, A., Ek, C.H. & Richards, A.. (2023). Mode-constrained Model-based Reinforcement Learning via Gaussian Processes. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:3299-3314 Available from https://proceedings.mlr.press/v206/scannell23a.html.

Related Material