A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Khimya Khetarpal, Zhaohan Daniel Guo, Bernardo Avila Pires, Yunhao Tang, Clare Lyle, Mark Rowland, Nicolas Heess, Diana L Borsa, Arthur Guez, Will Dabney
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:181-189, 2025.

Abstract

Learning a good representation is a crucial challenge for reinforcement learning (RL) agents. Self-predictive algorithms jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model in the case of a fixed policy (BYOL-$\Pi$); this assumption is at odds with practical implementations, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework. Interestingly, we uncover that BYOL-$\Pi$ and BYOL-AC are related through the lens of variance. We unify the study of these objectives through two complementary lenses; a model-based perspective, where each objective is related to low-rank approximation of certain dynamics, and a model-free perspective, which relates the objectives to modified value, Q-value, and Advantage functions. This mismatch with the true value functions leads to the empirical observation (in both linear and deep RL experiments) that BYOL-$\Pi$ and BYOL-AC are either very similar in performance across many tasks or task-dependent.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-khetarpal25a, title = {A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning}, author = {Khetarpal, Khimya and Guo, Zhaohan Daniel and Pires, Bernardo Avila and Tang, Yunhao and Lyle, Clare and Rowland, Mark and Heess, Nicolas and Borsa, Diana L and Guez, Arthur and Dabney, Will}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {181--189}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/khetarpal25a/khetarpal25a.pdf}, url = {https://proceedings.mlr.press/v258/khetarpal25a.html}, abstract = {Learning a good representation is a crucial challenge for reinforcement learning (RL) agents. Self-predictive algorithms jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model in the case of a fixed policy (BYOL-$\Pi$); this assumption is at odds with practical implementations, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework. Interestingly, we uncover that BYOL-$\Pi$ and BYOL-AC are related through the lens of variance. We unify the study of these objectives through two complementary lenses; a model-based perspective, where each objective is related to low-rank approximation of certain dynamics, and a model-free perspective, which relates the objectives to modified value, Q-value, and Advantage functions. This mismatch with the true value functions leads to the empirical observation (in both linear and deep RL experiments) that BYOL-$\Pi$ and BYOL-AC are either very similar in performance across many tasks or task-dependent.} }
Endnote
%0 Conference Paper %T A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning %A Khimya Khetarpal %A Zhaohan Daniel Guo %A Bernardo Avila Pires %A Yunhao Tang %A Clare Lyle %A Mark Rowland %A Nicolas Heess %A Diana L Borsa %A Arthur Guez %A Will Dabney %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-khetarpal25a %I PMLR %P 181--189 %U https://proceedings.mlr.press/v258/khetarpal25a.html %V 258 %X Learning a good representation is a crucial challenge for reinforcement learning (RL) agents. Self-predictive algorithms jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model in the case of a fixed policy (BYOL-$\Pi$); this assumption is at odds with practical implementations, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework. Interestingly, we uncover that BYOL-$\Pi$ and BYOL-AC are related through the lens of variance. We unify the study of these objectives through two complementary lenses; a model-based perspective, where each objective is related to low-rank approximation of certain dynamics, and a model-free perspective, which relates the objectives to modified value, Q-value, and Advantage functions. This mismatch with the true value functions leads to the empirical observation (in both linear and deep RL experiments) that BYOL-$\Pi$ and BYOL-AC are either very similar in performance across many tasks or task-dependent.
APA
Khetarpal, K., Guo, Z.D., Pires, B.A., Tang, Y., Lyle, C., Rowland, M., Heess, N., Borsa, D.L., Guez, A. & Dabney, W.. (2025). A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:181-189 Available from https://proceedings.mlr.press/v258/khetarpal25a.html.

Related Material