Quantile Credit Assignment

Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Remi Munos
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24517-24531, 2023.

Abstract

In reinforcement learning, the credit assignment problem is to distinguish luck from skill, that is, separate the inherent randomness in the environment from the controllable effects of the agent’s actions. This paper proposes two novel algorithms, Quantile Credit Assignment (QCA) and Hindsight QCA (HQCA), which incorporate distributional value estimation to perform credit assignment. QCA uses a network that predicts the quantiles of the return distribution, whereas HQCA additionally incorporates information about the future. Both QCA and HQCA have the appealing interpretation of leveraging an estimate of the quantile level of the return (interpreted as the level of "luck") in order to derive a "luck-dependent" baseline for policy gradient methods. We show theoretically that this approach gives an unbiased policy gradient estimate that can yield significant variance reductions over a standard value estimate baseline. QCA and HQCA significantly outperform prior state-of-the-art methods on a range of extremely difficult credit assignment problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-mesnard23a, title = {Quantile Credit Assignment}, author = {Mesnard, Thomas and Chen, Wenqi and Saade, Alaa and Tang, Yunhao and Rowland, Mark and Weber, Theophane and Lyle, Clare and Gruslys, Audrunas and Valko, Michal and Dabney, Will and Ostrovski, Georg and Moulines, Eric and Munos, Remi}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {24517--24531}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/mesnard23a/mesnard23a.pdf}, url = {https://proceedings.mlr.press/v202/mesnard23a.html}, abstract = {In reinforcement learning, the credit assignment problem is to distinguish luck from skill, that is, separate the inherent randomness in the environment from the controllable effects of the agent’s actions. This paper proposes two novel algorithms, Quantile Credit Assignment (QCA) and Hindsight QCA (HQCA), which incorporate distributional value estimation to perform credit assignment. QCA uses a network that predicts the quantiles of the return distribution, whereas HQCA additionally incorporates information about the future. Both QCA and HQCA have the appealing interpretation of leveraging an estimate of the quantile level of the return (interpreted as the level of "luck") in order to derive a "luck-dependent" baseline for policy gradient methods. We show theoretically that this approach gives an unbiased policy gradient estimate that can yield significant variance reductions over a standard value estimate baseline. QCA and HQCA significantly outperform prior state-of-the-art methods on a range of extremely difficult credit assignment problems.} }
Endnote
%0 Conference Paper %T Quantile Credit Assignment %A Thomas Mesnard %A Wenqi Chen %A Alaa Saade %A Yunhao Tang %A Mark Rowland %A Theophane Weber %A Clare Lyle %A Audrunas Gruslys %A Michal Valko %A Will Dabney %A Georg Ostrovski %A Eric Moulines %A Remi Munos %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-mesnard23a %I PMLR %P 24517--24531 %U https://proceedings.mlr.press/v202/mesnard23a.html %V 202 %X In reinforcement learning, the credit assignment problem is to distinguish luck from skill, that is, separate the inherent randomness in the environment from the controllable effects of the agent’s actions. This paper proposes two novel algorithms, Quantile Credit Assignment (QCA) and Hindsight QCA (HQCA), which incorporate distributional value estimation to perform credit assignment. QCA uses a network that predicts the quantiles of the return distribution, whereas HQCA additionally incorporates information about the future. Both QCA and HQCA have the appealing interpretation of leveraging an estimate of the quantile level of the return (interpreted as the level of "luck") in order to derive a "luck-dependent" baseline for policy gradient methods. We show theoretically that this approach gives an unbiased policy gradient estimate that can yield significant variance reductions over a standard value estimate baseline. QCA and HQCA significantly outperform prior state-of-the-art methods on a range of extremely difficult credit assignment problems.
APA
Mesnard, T., Chen, W., Saade, A., Tang, Y., Rowland, M., Weber, T., Lyle, C., Gruslys, A., Valko, M., Dabney, W., Ostrovski, G., Moulines, E. & Munos, R.. (2023). Quantile Credit Assignment. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:24517-24531 Available from https://proceedings.mlr.press/v202/mesnard23a.html.

Related Material