Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition

Yash Chandak, Shantanu Thakoor, Zhaohan Daniel Guo, Yunhao Tang, Remi Munos, Will Dabney, Diana L Borsa
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:4009-4034, 2023.

Abstract

Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-chandak23a, title = {Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition}, author = {Chandak, Yash and Thakoor, Shantanu and Guo, Zhaohan Daniel and Tang, Yunhao and Munos, Remi and Dabney, Will and Borsa, Diana L}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {4009--4034}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/chandak23a/chandak23a.pdf}, url = {https://proceedings.mlr.press/v202/chandak23a.html}, abstract = {Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.} }
Endnote
%0 Conference Paper %T Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition %A Yash Chandak %A Shantanu Thakoor %A Zhaohan Daniel Guo %A Yunhao Tang %A Remi Munos %A Will Dabney %A Diana L Borsa %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-chandak23a %I PMLR %P 4009--4034 %U https://proceedings.mlr.press/v202/chandak23a.html %V 202 %X Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.
APA
Chandak, Y., Thakoor, S., Guo, Z.D., Tang, Y., Munos, R., Dabney, W. & Borsa, D.L.. (2023). Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:4009-4034 Available from https://proceedings.mlr.press/v202/chandak23a.html.

Related Material