A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning

Alix Lheritier; Nicolas Bondoux

A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning

Alix Lheritier, Nicolas Bondoux

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5774-5789, 2022.

Abstract

Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-lheritier22a,
  title = 	 { A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning },
  author =       {Lheritier, Alix and Bondoux, Nicolas},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {5774--5789},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/lheritier22a/lheritier22a.pdf},
  url = 	 {https://proceedings.mlr.press/v151/lheritier22a.html},
  abstract = 	 { Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance. }
}

Endnote

%0 Conference Paper
%T  A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning 
%A Alix Lheritier
%A Nicolas Bondoux
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-lheritier22a
%I PMLR
%P 5774--5789
%U https://proceedings.mlr.press/v151/lheritier22a.html
%V 151
%X  Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance.

APA


Lheritier, A. & Bondoux, N.. (2022).  A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:5774-5789 Available from https://proceedings.mlr.press/v151/lheritier22a.html.

A Cramér Distance perspective on Quantile Regression based Distributional Reinforcement Learning

Abstract

Cite this Paper

Related Material