Anti-Exploration by Random Network Distillation

Alexander Nikulin; Vladislav Kurenkov; Denis Tarasov; Sergey Kolesnikov

Anti-Exploration by Random Network Distillation

Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:26228-26244, 2023.

Abstract

Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-nikulin23a,
  title = 	 {Anti-Exploration by Random Network Distillation},
  author =       {Nikulin, Alexander and Kurenkov, Vladislav and Tarasov, Denis and Kolesnikov, Sergey},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {26228--26244},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/nikulin23a/nikulin23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/nikulin23a.html},
  abstract = 	 {Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.}
}

Endnote

%0 Conference Paper
%T Anti-Exploration by Random Network Distillation
%A Alexander Nikulin
%A Vladislav Kurenkov
%A Denis Tarasov
%A Sergey Kolesnikov
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-nikulin23a
%I PMLR
%P 26228--26244
%U https://proceedings.mlr.press/v202/nikulin23a.html
%V 202
%X Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

APA

Nikulin, A., Kurenkov, V., Tarasov, D. & Kolesnikov, S.. (2023). Anti-Exploration by Random Network Distillation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:26228-26244 Available from https://proceedings.mlr.press/v202/nikulin23a.html.

Anti-Exploration by Random Network Distillation

Abstract

Cite this Paper

Related Material