Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

Vivek Myers, Chongyi Zheng, Anca Dragan, Sergey Levine, Benjamin Eysenbach
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:37076-37096, 2024.

Abstract

Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not merely a definitional concern, but translates to an inability to generalize and find shortest paths. In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. Importantly, this temporal distance is computationally efficient to estimate, even in high-dimensional and stochastic settings. Experiments in controlled settings and benchmark suites demonstrate that an RL algorithm based on these new temporal distances exhibits combinatorial generalization (i.e., "stitching") and can sometimes learn more quickly than prior methods, including those based on quasimetrics.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-myers24a, title = {Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making}, author = {Myers, Vivek and Zheng, Chongyi and Dragan, Anca and Levine, Sergey and Eysenbach, Benjamin}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {37076--37096}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/myers24a/myers24a.pdf}, url = {https://proceedings.mlr.press/v235/myers24a.html}, abstract = {Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not merely a definitional concern, but translates to an inability to generalize and find shortest paths. In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. Importantly, this temporal distance is computationally efficient to estimate, even in high-dimensional and stochastic settings. Experiments in controlled settings and benchmark suites demonstrate that an RL algorithm based on these new temporal distances exhibits combinatorial generalization (i.e., "stitching") and can sometimes learn more quickly than prior methods, including those based on quasimetrics.} }
Endnote
%0 Conference Paper %T Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making %A Vivek Myers %A Chongyi Zheng %A Anca Dragan %A Sergey Levine %A Benjamin Eysenbach %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-myers24a %I PMLR %P 37076--37096 %U https://proceedings.mlr.press/v235/myers24a.html %V 235 %X Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning that involve reaching goals, allowing one to estimate the transit time between two states. However, prior attempts to define such temporal distances in stochastic settings have been stymied by an important limitation: these prior approaches do not satisfy the triangle inequality. This is not merely a definitional concern, but translates to an inability to generalize and find shortest paths. In this paper, we build on prior work in contrastive learning and quasimetrics to show how successor features learned by contrastive learning (after a change of variables) form a temporal distance that does satisfy the triangle inequality, even in stochastic settings. Importantly, this temporal distance is computationally efficient to estimate, even in high-dimensional and stochastic settings. Experiments in controlled settings and benchmark suites demonstrate that an RL algorithm based on these new temporal distances exhibits combinatorial generalization (i.e., "stitching") and can sometimes learn more quickly than prior methods, including those based on quasimetrics.
APA
Myers, V., Zheng, C., Dragan, A., Levine, S. & Eysenbach, B.. (2024). Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:37076-37096 Available from https://proceedings.mlr.press/v235/myers24a.html.

Related Material