Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Sami Jullien; Romain Deffayet; Jean-Michel Renders; Paul Groth; Maarten de Rijke

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Sami Jullien, Romain Deffayet, Jean-Michel Renders, Paul Groth, Maarten de Rijke

Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:1909-1923, 2025.

Abstract

Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts a rich feedback from environment samples. The commonly used quantile regression approach to distributional RL – based on asymmetric $L_1$ losses – provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional Bellman operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

Cite this Paper

BibTeX

@InProceedings{pmlr-v286-jullien25a,
  title = 	 {Distributional Reinforcement Learning with Dual Expectile-Quantile Regression},
  author =       {Jullien, Sami and Deffayet, Romain and Renders, Jean-Michel and Groth, Paul and Rijke, Maarten de},
  booktitle = 	 {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {1909--1923},
  year = 	 {2025},
  editor = 	 {Chiappa, Silvia and Magliacane, Sara},
  volume = 	 {286},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--25 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v286/main/assets/jullien25a/jullien25a.pdf},
  url = 	 {https://proceedings.mlr.press/v286/jullien25a.html},
  abstract = 	 {Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts a rich feedback from environment samples. The commonly used quantile regression approach to distributional RL – based on asymmetric $L_1$ losses – provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional Bellman operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.}
}

Endnote

%0 Conference Paper
%T Distributional Reinforcement Learning with Dual Expectile-Quantile Regression
%A Sami Jullien
%A Romain Deffayet
%A Jean-Michel Renders
%A Paul Groth
%A Maarten de Rijke
%B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2025
%E Silvia Chiappa
%E Sara Magliacane	
%F pmlr-v286-jullien25a
%I PMLR
%P 1909--1923
%U https://proceedings.mlr.press/v286/jullien25a.html
%V 286
%X Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and extracts a rich feedback from environment samples. The commonly used quantile regression approach to distributional RL – based on asymmetric $L_1$ losses – provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, asymmetric hybrid $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our proposed operator converges to the distributional Bellman operator in the limit of infinite estimated quantile and expectile fractions, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

APA

Jullien, S., Deffayet, R., Renders, J., Groth, P. & Rijke, M.d.. (2025). Distributional Reinforcement Learning with Dual Expectile-Quantile Regression. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:1909-1923 Available from https://proceedings.mlr.press/v286/jullien25a.html.

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Abstract

Cite this Paper

Related Material