Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics

Tyler Kastner, Mark Rowland, Yunhao Tang, Murat A Erdogdu, Amir-Massoud Farahmand
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:29294-29320, 2025.

Abstract

We study the problem of distributional reinforcement learning using categorical parametrisations and a KL divergence loss. Previous work analyzing categorical distributional RL has done so using a Cramér distance-based loss, simplifying the analysis but creating a theory-practice gap. We introduce a preconditioned version of the algorithm, and prove that it is guaranteed to converge. We further derive the asymptotic variance of the categorical estimates under different learning rate regimes, and compare to that of classical reinforcement learning. We finally empirically validate our theoretical results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kastner25a, title = {Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics}, author = {Kastner, Tyler and Rowland, Mark and Tang, Yunhao and Erdogdu, Murat A and Farahmand, Amir-Massoud}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {29294--29320}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kastner25a/kastner25a.pdf}, url = {https://proceedings.mlr.press/v267/kastner25a.html}, abstract = {We study the problem of distributional reinforcement learning using categorical parametrisations and a KL divergence loss. Previous work analyzing categorical distributional RL has done so using a Cramér distance-based loss, simplifying the analysis but creating a theory-practice gap. We introduce a preconditioned version of the algorithm, and prove that it is guaranteed to converge. We further derive the asymptotic variance of the categorical estimates under different learning rate regimes, and compare to that of classical reinforcement learning. We finally empirically validate our theoretical results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.} }
Endnote
%0 Conference Paper %T Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics %A Tyler Kastner %A Mark Rowland %A Yunhao Tang %A Murat A Erdogdu %A Amir-Massoud Farahmand %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kastner25a %I PMLR %P 29294--29320 %U https://proceedings.mlr.press/v267/kastner25a.html %V 267 %X We study the problem of distributional reinforcement learning using categorical parametrisations and a KL divergence loss. Previous work analyzing categorical distributional RL has done so using a Cramér distance-based loss, simplifying the analysis but creating a theory-practice gap. We introduce a preconditioned version of the algorithm, and prove that it is guaranteed to converge. We further derive the asymptotic variance of the categorical estimates under different learning rate regimes, and compare to that of classical reinforcement learning. We finally empirically validate our theoretical results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.
APA
Kastner, T., Rowland, M., Tang, Y., Erdogdu, M.A. & Farahmand, A.. (2025). Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:29294-29320 Available from https://proceedings.mlr.press/v267/kastner25a.html.

Related Material