Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Scott Fujimoto, David Meger, Doina Precup, Ofir Nachum, Shixiang Shane Gu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:6918-6943, 2022.

Abstract

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-fujimoto22a, title = {Why Should I Trust You, Bellman? {T}he {B}ellman Error is a Poor Replacement for Value Error}, author = {Fujimoto, Scott and Meger, David and Precup, Doina and Nachum, Ofir and Gu, Shixiang Shane}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {6918--6943}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/fujimoto22a/fujimoto22a.pdf}, url = {https://proceedings.mlr.press/v162/fujimoto22a.html}, abstract = {In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.} }
Endnote
%0 Conference Paper %T Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error %A Scott Fujimoto %A David Meger %A Doina Precup %A Ofir Nachum %A Shixiang Shane Gu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-fujimoto22a %I PMLR %P 6918--6943 %U https://proceedings.mlr.press/v162/fujimoto22a.html %V 162 %X In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.
APA
Fujimoto, S., Meger, D., Precup, D., Nachum, O. & Gu, S.S.. (2022). Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:6918-6943 Available from https://proceedings.mlr.press/v162/fujimoto22a.html.

Related Material