A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Mikael Henaff; Minqi Jiang; Roberta Raileanu

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Mikael Henaff, Minqi Jiang, Roberta Raileanu

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:12972-12999, 2023.

Abstract

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent’s entire training experience, and episodic novelty bonuses, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly understood. In this work, we shed light on the behavior of these two types of bonuses through controlled experiments on easily interpretable tasks as well as challenging pixel-based settings. We find that the two types of bonuses succeed in different settings, with episodic bonuses being most effective when there is little shared structure across episodes and global bonuses being effective when more structure is shared. We develop a conceptual framework which makes this notion of shared structure precise by considering the variance of the value function across contexts, and which provides a unifying explanation of our empirical results. We furthermore find that combining the two bonuses can lead to more robust performance across different degrees of shared structure, and investigate different algorithmic choices for defining and combining global and episodic bonuses based on function approximation. This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma’s Revenge.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-henaff23a,
  title = 	 {A Study of Global and Episodic Bonuses for Exploration in Contextual {MDP}s},
  author =       {Henaff, Mikael and Jiang, Minqi and Raileanu, Roberta},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {12972--12999},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/henaff23a/henaff23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/henaff23a.html},
  abstract = 	 {Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent’s entire training experience, and episodic novelty bonuses, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly understood. In this work, we shed light on the behavior of these two types of bonuses through controlled experiments on easily interpretable tasks as well as challenging pixel-based settings. We find that the two types of bonuses succeed in different settings, with episodic bonuses being most effective when there is little shared structure across episodes and global bonuses being effective when more structure is shared. We develop a conceptual framework which makes this notion of shared structure precise by considering the variance of the value function across contexts, and which provides a unifying explanation of our empirical results. We furthermore find that combining the two bonuses can lead to more robust performance across different degrees of shared structure, and investigate different algorithmic choices for defining and combining global and episodic bonuses based on function approximation. This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma’s Revenge.}
}

Endnote

%0 Conference Paper
%T A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs
%A Mikael Henaff
%A Minqi Jiang
%A Roberta Raileanu
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-henaff23a
%I PMLR
%P 12972--12999
%U https://proceedings.mlr.press/v202/henaff23a.html
%V 202
%X Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the agent’s entire training experience, and episodic novelty bonuses, computed using only experience from the current episode. However, the use of these two types of bonuses has been ad-hoc and poorly understood. In this work, we shed light on the behavior of these two types of bonuses through controlled experiments on easily interpretable tasks as well as challenging pixel-based settings. We find that the two types of bonuses succeed in different settings, with episodic bonuses being most effective when there is little shared structure across episodes and global bonuses being effective when more structure is shared. We develop a conceptual framework which makes this notion of shared structure precise by considering the variance of the value function across contexts, and which provides a unifying explanation of our empirical results. We furthermore find that combining the two bonuses can lead to more robust performance across different degrees of shared structure, and investigate different algorithmic choices for defining and combining global and episodic bonuses based on function approximation. This results in an algorithm which sets a new state of the art across 16 tasks from the MiniHack suite used in prior work, and also performs robustly on Habitat and Montezuma’s Revenge.

APA


Henaff, M., Jiang, M. & Raileanu, R.. (2023). A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:12972-12999 Available from https://proceedings.mlr.press/v202/henaff23a.html.

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Abstract

Cite this Paper

Related Material