A Distributional Analogue to the Successor Representation

Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:52994-53016, 2024.

Abstract

This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wiltzer24a, title = {A Distributional Analogue to the Successor Representation}, author = {Wiltzer, Harley and Farebrother, Jesse and Gretton, Arthur and Tang, Yunhao and Barreto, Andre and Dabney, Will and Bellemare, Marc G and Rowland, Mark}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {52994--53016}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wiltzer24a/wiltzer24a.pdf}, url = {https://proceedings.mlr.press/v235/wiltzer24a.html}, abstract = {This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.} }
Endnote
%0 Conference Paper %T A Distributional Analogue to the Successor Representation %A Harley Wiltzer %A Jesse Farebrother %A Arthur Gretton %A Yunhao Tang %A Andre Barreto %A Will Dabney %A Marc G Bellemare %A Mark Rowland %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wiltzer24a %I PMLR %P 52994--53016 %U https://proceedings.mlr.press/v235/wiltzer24a.html %V 235 %X This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.
APA
Wiltzer, H., Farebrother, J., Gretton, A., Tang, Y., Barreto, A., Dabney, W., Bellemare, M.G. & Rowland, M.. (2024). A Distributional Analogue to the Successor Representation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:52994-53016 Available from https://proceedings.mlr.press/v235/wiltzer24a.html.

Related Material