ConQUR: Mitigating Delusional Bias in Deep Q-Learning

Dijia Su; Jayden Ooi; Tyler Lu; Dale Schuurmans; Craig Boutilier

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

Dijia Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9187-9195, 2020.

Abstract

Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

Cite this Paper

BibTeX

@InProceedings{pmlr-v119-su20c,
  title = 	 {{C}on{QUR}: Mitigating Delusional Bias in Deep Q-Learning},
  author =       {Su, Dijia and Ooi, Jayden and Lu, Tyler and Schuurmans, Dale and Boutilier, Craig},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {9187--9195},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/su20c/su20c.pdf},
  url = 	 {https://proceedings.mlr.press/v119/su20c.html},
  abstract = 	 {Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.}
}

Endnote

%0 Conference Paper
%T ConQUR: Mitigating Delusional Bias in Deep Q-Learning
%A Dijia Su
%A Jayden Ooi
%A Tyler Lu
%A Dale Schuurmans
%A Craig Boutilier
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-su20c
%I PMLR
%P 9187--9195
%U https://proceedings.mlr.press/v119/su20c.html
%V 119
%X Delusional bias is a fundamental source of error in approximate Q-learning. To date, the only techniques that explicitly address delusion require comprehensive search using tabular value estimates. In this paper, we develop efficient methods to mitigate delusional bias by training Q-approximators with labels that are "consistent" with the underlying greedy policy class. We introduce a simple penalization scheme that encourages Q-labels used across training batches to remain (jointly) consistent with the expressible policy class. We also propose a search framework that allows multiple Q-approximators to be generated and tracked, thus mitigating the effect of premature (implicit) policy commitments. Experimental results demonstrate that these methods can improve the performance of Q-learning in a variety of Atari games, sometimes dramatically.

APA

Su, D., Ooi, J., Lu, T., Schuurmans, D. & Boutilier, C.. (2020). ConQUR: Mitigating Delusional Bias in Deep Q-Learning. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9187-9195 Available from https://proceedings.mlr.press/v119/su20c.html.

ConQUR: Mitigating Delusional Bias in Deep Q-Learning

Abstract

Cite this Paper

Related Material