Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent

Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:29022-29062, 2024.

Abstract

We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-li24by, title = {Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via {H}yper{A}gent}, author = {Li, Yingru and Xu, Jiawei and Han, Lei and Luo, Zhi-Quan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {29022--29062}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24by/li24by.pdf}, url = {https://proceedings.mlr.press/v235/li24by.html}, abstract = {We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm.} }
Endnote
%0 Conference Paper %T Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent %A Yingru Li %A Jiawei Xu %A Lei Han %A Zhi-Quan Luo %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-li24by %I PMLR %P 29022--29062 %U https://proceedings.mlr.press/v235/li24by.html %V 235 %X We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm.
APA
Li, Y., Xu, J., Han, L. & Luo, Z.. (2024). Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:29022-29062 Available from https://proceedings.mlr.press/v235/li24by.html.

Related Material