Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Remi Munos, Michal Valko
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:14780-14816, 2023.

Abstract

Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma’s Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome—which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma’s Revenge with sticky actions, while preserving performance in the non-sticky setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-jarrett23a, title = {Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments}, author = {Jarrett, Daniel and Tallec, Corentin and Altch\'{e}, Florent and Mesnard, Thomas and Munos, Remi and Valko, Michal}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {14780--14816}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/jarrett23a/jarrett23a.pdf}, url = {https://proceedings.mlr.press/v202/jarrett23a.html}, abstract = {Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma’s Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome—which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma’s Revenge with sticky actions, while preserving performance in the non-sticky setting.} }
Endnote
%0 Conference Paper %T Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments %A Daniel Jarrett %A Corentin Tallec %A Florent Altché %A Thomas Mesnard %A Remi Munos %A Michal Valko %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-jarrett23a %I PMLR %P 14780--14816 %U https://proceedings.mlr.press/v202/jarrett23a.html %V 202 %X Consider the problem of exploration in sparse-reward or reward-free environments, such as in Montezuma’s Revenge. In the curiosity-driven paradigm, the agent is rewarded for how much each realized outcome differs from their predicted outcome. But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a "noisy TV". In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome—which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics. First, we propose incorporating such hindsight representations into models to disentangle "noise" from "novelty", yielding Curiosity in Hindsight: a simple and scalable generalization of curiosity that is robust to stochasticity. Second, we instantiate this framework for the recently introduced BYOL-Explore algorithm as our prime example, resulting in the noise-robust BYOL-Hindsight. Third, we illustrate its behavior under a variety of different stochasticities in a grid world, and find improvements over BYOL-Explore in hard-exploration Atari games with sticky actions. Notably, we show state-of-the-art results in exploring Montezuma’s Revenge with sticky actions, while preserving performance in the non-sticky setting.
APA
Jarrett, D., Tallec, C., Altché, F., Mesnard, T., Munos, R. & Valko, M.. (2023). Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:14780-14816 Available from https://proceedings.mlr.press/v202/jarrett23a.html.

Related Material