Model-based Reinforcement Learning for Parameterized Action Spaces

Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:58935-58954, 2024.

Abstract

We propose a novel model-based reinforcement learning algorithm—Dynamics Learning and predictive control with Parameterized Actions (DLPA)—for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generated trajectory and the optimal trajectory during planning in terms of the value they achieved through the lens of Lipschitz Continuity. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhang24r, title = {Model-based Reinforcement Learning for Parameterized Action Spaces}, author = {Zhang, Renhao and Fu, Haotian and Miao, Yilin and Konidaris, George}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {58935--58954}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24r/zhang24r.pdf}, url = {https://proceedings.mlr.press/v235/zhang24r.html}, abstract = {We propose a novel model-based reinforcement learning algorithm—Dynamics Learning and predictive control with Parameterized Actions (DLPA)—for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generated trajectory and the optimal trajectory during planning in terms of the value they achieved through the lens of Lipschitz Continuity. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods.} }
Endnote
%0 Conference Paper %T Model-based Reinforcement Learning for Parameterized Action Spaces %A Renhao Zhang %A Haotian Fu %A Yilin Miao %A George Konidaris %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhang24r %I PMLR %P 58935--58954 %U https://proceedings.mlr.press/v235/zhang24r.html %V 235 %X We propose a novel model-based reinforcement learning algorithm—Dynamics Learning and predictive control with Parameterized Actions (DLPA)—for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generated trajectory and the optimal trajectory during planning in terms of the value they achieved through the lens of Lipschitz Continuity. Our empirical results on several standard benchmarks show that our algorithm achieves superior sample efficiency and asymptotic performance than state-of-the-art PAMDP methods.
APA
Zhang, R., Fu, H., Miao, Y. & Konidaris, G.. (2024). Model-based Reinforcement Learning for Parameterized Action Spaces. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:58935-58954 Available from https://proceedings.mlr.press/v235/zhang24r.html.

Related Material