Decision-Aware Model Learning for Actor-Critic Methods: When Theory Does Not Meet Practice

Ângelo G. Lovatto, Thiago P. Bueno, Denis D. Mauá, Leliane N. Barros
Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, PMLR 137:76-86, 2020.

Abstract

Actor-Critic methods are a prominent class of modern reinforcement learning algorithms based on the classic Policy Iteration procedure. Despite many successful cases, Actor-Critic methods tend to require a gigantic number of experiences and can be very unstable. Recent approaches have advocated learning and using a world model to improve sample efficiency and reduce reliance on the value function estimate. However, learning an accurate dynamics model of the world remains challenging, often requiring computationally costly and data-hungry models. More recent work has shown that learning an everywhere accurate model is unnecessary and often detrimental to the overall task; instead, the agent should improve the world model on task-critical regions. For example, in Iterative Value-Aware Model Learning, the authors extend model-based value iteration by incorporating the value function (estimate) into the model loss function, showing the novel model objective reflects improved performance in the end task. Therefore, it seems natural to expect that model-based Actor-Critic methods can benefit equally from learning value-aware models, improving overall task performance, or reducing the need for large, expensive models. However, we show empirically that combining Actor-Critic and value-aware model learning can be quite difficult and that naive approaches such as maximum likelihood estimation often achieve superior performance with less computational cost. Our results suggest that, despite theoretical guarantees, learning a value-aware model in continuous domains does not ensure better performance on the overall task.

Cite this Paper


BibTeX
@InProceedings{pmlr-v137-lovatto20a, title = {Decision-Aware Model Learning for Actor-Critic Methods: When Theory Does Not Meet Practice}, author = {Lovatto, \^{A}ngelo G. and Bueno, Thiago P. and Mau\'{a}, Denis D. and de Barros, Leliane N.}, booktitle = {Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops}, pages = {76--86}, year = {2020}, editor = {Zosa Forde, Jessica and Ruiz, Francisco and Pradier, Melanie F. and Schein, Aaron}, volume = {137}, series = {Proceedings of Machine Learning Research}, month = {12 Dec}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v137/lovatto20a/lovatto20a.pdf}, url = {https://proceedings.mlr.press/v137/lovatto20a.html}, abstract = {Actor-Critic methods are a prominent class of modern reinforcement learning algorithms based on the classic Policy Iteration procedure. Despite many successful cases, Actor-Critic methods tend to require a gigantic number of experiences and can be very unstable. Recent approaches have advocated learning and using a world model to improve sample efficiency and reduce reliance on the value function estimate. However, learning an accurate dynamics model of the world remains challenging, often requiring computationally costly and data-hungry models. More recent work has shown that learning an everywhere accurate model is unnecessary and often detrimental to the overall task; instead, the agent should improve the world model on task-critical regions. For example, in Iterative Value-Aware Model Learning, the authors extend model-based value iteration by incorporating the value function (estimate) into the model loss function, showing the novel model objective reflects improved performance in the end task. Therefore, it seems natural to expect that model-based Actor-Critic methods can benefit equally from learning value-aware models, improving overall task performance, or reducing the need for large, expensive models. However, we show empirically that combining Actor-Critic and value-aware model learning can be quite difficult and that naive approaches such as maximum likelihood estimation often achieve superior performance with less computational cost. Our results suggest that, despite theoretical guarantees, learning a value-aware model in continuous domains does not ensure better performance on the overall task.} }
Endnote
%0 Conference Paper %T Decision-Aware Model Learning for Actor-Critic Methods: When Theory Does Not Meet Practice %A Ângelo G. Lovatto %A Thiago P. Bueno %A Denis D. Mauá %A Leliane N. Barros %B Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops %C Proceedings of Machine Learning Research %D 2020 %E Jessica Zosa Forde %E Francisco Ruiz %E Melanie F. Pradier %E Aaron Schein %F pmlr-v137-lovatto20a %I PMLR %P 76--86 %U https://proceedings.mlr.press/v137/lovatto20a.html %V 137 %X Actor-Critic methods are a prominent class of modern reinforcement learning algorithms based on the classic Policy Iteration procedure. Despite many successful cases, Actor-Critic methods tend to require a gigantic number of experiences and can be very unstable. Recent approaches have advocated learning and using a world model to improve sample efficiency and reduce reliance on the value function estimate. However, learning an accurate dynamics model of the world remains challenging, often requiring computationally costly and data-hungry models. More recent work has shown that learning an everywhere accurate model is unnecessary and often detrimental to the overall task; instead, the agent should improve the world model on task-critical regions. For example, in Iterative Value-Aware Model Learning, the authors extend model-based value iteration by incorporating the value function (estimate) into the model loss function, showing the novel model objective reflects improved performance in the end task. Therefore, it seems natural to expect that model-based Actor-Critic methods can benefit equally from learning value-aware models, improving overall task performance, or reducing the need for large, expensive models. However, we show empirically that combining Actor-Critic and value-aware model learning can be quite difficult and that naive approaches such as maximum likelihood estimation often achieve superior performance with less computational cost. Our results suggest that, despite theoretical guarantees, learning a value-aware model in continuous domains does not ensure better performance on the overall task.
APA
Lovatto, Â.G., Bueno, T.P., Mauá, D.D. & Barros, L.N.. (2020). Decision-Aware Model Learning for Actor-Critic Methods: When Theory Does Not Meet Practice. Proceedings on "I Can't Believe It's Not Better!" at NeurIPS Workshops, in Proceedings of Machine Learning Research 137:76-86 Available from https://proceedings.mlr.press/v137/lovatto20a.html.

Related Material