Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:5556-5566, 2020.

Abstract

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method—Truncated Quantile Critics, TQC,—blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-kuznetsov20a, title = {Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics}, author = {Kuznetsov, Arsenii and Shvechikov, Pavel and Grishin, Alexander and Vetrov, Dmitry}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {5556--5566}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/kuznetsov20a/kuznetsov20a.pdf}, url = {https://proceedings.mlr.press/v119/kuznetsov20a.html}, abstract = {The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method—Truncated Quantile Critics, TQC,—blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.} }
Endnote
%0 Conference Paper %T Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics %A Arsenii Kuznetsov %A Pavel Shvechikov %A Alexander Grishin %A Dmitry Vetrov %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-kuznetsov20a %I PMLR %P 5556--5566 %U https://proceedings.mlr.press/v119/kuznetsov20a.html %V 119 %X The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method—Truncated Quantile Critics, TQC,—blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.
APA
Kuznetsov, A., Shvechikov, P., Grishin, A. & Vetrov, D.. (2020). Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:5556-5566 Available from https://proceedings.mlr.press/v119/kuznetsov20a.html.

Related Material