A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning

Alix Lheritier, Nicolas Bondoux
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5774-5789, 2022.

Abstract

Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-lheritier22a, title = { A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning }, author = {Lheritier, Alix and Bondoux, Nicolas}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {5774--5789}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/lheritier22a/lheritier22a.pdf}, url = {https://proceedings.mlr.press/v151/lheritier22a.html}, abstract = { Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance. } }
Endnote
%0 Conference Paper %T A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning %A Alix Lheritier %A Nicolas Bondoux %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-lheritier22a %I PMLR %P 5774--5789 %U https://proceedings.mlr.press/v151/lheritier22a.html %V 151 %X Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance.
APA
Lheritier, A. & Bondoux, N.. (2022). A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:5774-5789 Available from https://proceedings.mlr.press/v151/lheritier22a.html.

Related Material