Understanding Self-Predictive Learning for Reinforcement Learning

Yunhao Tang; Zhaohan Daniel Guo; Pierre Harvey Richemond; Bernardo Avila Pires; Yash Chandak; Remi Munos; Mark Rowland; Mohammad Gheshlaghi Azar; Charline Le Lan; Clare Lyle; András György; Shantanu Thakoor; Will Dabney; Bilal Piot; Daniele Calandriello; Michal Valko

Understanding Self-Predictive Learning for Reinforcement Learning

Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Avila Pires, Yash Chandak, Remi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:33632-33656, 2023.

Abstract

We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-tang23d,
  title = 	 {Understanding Self-Predictive Learning for Reinforcement Learning},
  author =       {Tang, Yunhao and Guo, Zhaohan Daniel and Richemond, Pierre Harvey and Avila Pires, Bernardo and Chandak, Yash and Munos, Remi and Rowland, Mark and Gheshlaghi Azar, Mohammad and Le Lan, Charline and Lyle, Clare and Gy\"{o}rgy, Andr\'{a}s and Thakoor, Shantanu and Dabney, Will and Piot, Bilal and Calandriello, Daniele and Valko, Michal},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {33632--33656},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/tang23d/tang23d.pdf},
  url = 	 {https://proceedings.mlr.press/v202/tang23d.html},
  abstract = 	 {We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.}
}

Endnote

%0 Conference Paper
%T Understanding Self-Predictive Learning for Reinforcement Learning
%A Yunhao Tang
%A Zhaohan Daniel Guo
%A Pierre Harvey Richemond
%A Bernardo Avila Pires
%A Yash Chandak
%A Remi Munos
%A Mark Rowland
%A Mohammad Gheshlaghi Azar
%A Charline Le Lan
%A Clare Lyle
%A András György
%A Shantanu Thakoor
%A Will Dabney
%A Bilal Piot
%A Daniele Calandriello
%A Michal Valko
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-tang23d
%I PMLR
%P 33632--33656
%U https://proceedings.mlr.press/v202/tang23d.html
%V 202
%X We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.

APA


Tang, Y., Guo, Z.D., Richemond, P.H., Avila Pires, B., Chandak, Y., Munos, R., Rowland, M., Gheshlaghi Azar, M., Le Lan, C., Lyle, C., György, A., Thakoor, S., Dabney, W., Piot, B., Calandriello, D. & Valko, M.. (2023). Understanding Self-Predictive Learning for Reinforcement Learning. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:33632-33656 Available from https://proceedings.mlr.press/v202/tang23d.html.

Understanding Self-Predictive Learning for Reinforcement Learning

Abstract

Cite this Paper

Related Material