Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces

Avik Kar, Rahul Singh
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:1924-1964, 2025.

Abstract

We study infinite-horizon average-reward reinforcement learning (RL) for Lipschitz MDPs, a broad class that subsumes several important classes such as linear and RKHS MDPs, function approximation frameworks, and develop an adaptive algorithm $\text{ZoRL}$ with regret bounded as $\mathcal{O}\big(T^{1 - d_{\text{eff.}}^{-1}}\big)$, where $d_{\text{eff.}}= 2d_\mathcal{S} + d_z + 3$, $d_\mathcal{S}$ is the dimension of the state space and $d_z$ is the zooming dimension. In contrast, algorithms with fixed discretization yield $d_{\text{eff.}} = 2(d_\mathcal{S} + d_\mathcal{A}) + 2$, $d_\mathcal{A}$ being the dimension of action space. $\text{ZoRL}$ achieves this by discretizing the state-action space adaptively and zooming into ”promising regions” of the state-action space. $d_z$, a problem-dependent quantity bounded by the state-action space’s dimension, allows us to conclude that if an MDP is benign, then the regret of $\text{ZoRL}$ will be small. The zooming dimension and $\text{ZoRL}$ are truly adaptive, i.e., the current work shows how to capture adaptivity gains for infinite-horizon average-reward RL. $\text{ZoRL}$ outperforms other state-of-the-art algorithms in experiments, thereby demonstrating the gains arising due to adaptivity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-kar25a, title = {Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces}, author = {Kar, Avik and Singh, Rahul}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {1924--1964}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/kar25a/kar25a.pdf}, url = {https://proceedings.mlr.press/v286/kar25a.html}, abstract = {We study infinite-horizon average-reward reinforcement learning (RL) for Lipschitz MDPs, a broad class that subsumes several important classes such as linear and RKHS MDPs, function approximation frameworks, and develop an adaptive algorithm $\text{ZoRL}$ with regret bounded as $\mathcal{O}\big(T^{1 - d_{\text{eff.}}^{-1}}\big)$, where $d_{\text{eff.}}= 2d_\mathcal{S} + d_z + 3$, $d_\mathcal{S}$ is the dimension of the state space and $d_z$ is the zooming dimension. In contrast, algorithms with fixed discretization yield $d_{\text{eff.}} = 2(d_\mathcal{S} + d_\mathcal{A}) + 2$, $d_\mathcal{A}$ being the dimension of action space. $\text{ZoRL}$ achieves this by discretizing the state-action space adaptively and zooming into ”promising regions” of the state-action space. $d_z$, a problem-dependent quantity bounded by the state-action space’s dimension, allows us to conclude that if an MDP is benign, then the regret of $\text{ZoRL}$ will be small. The zooming dimension and $\text{ZoRL}$ are truly adaptive, i.e., the current work shows how to capture adaptivity gains for infinite-horizon average-reward RL. $\text{ZoRL}$ outperforms other state-of-the-art algorithms in experiments, thereby demonstrating the gains arising due to adaptivity.} }
Endnote
%0 Conference Paper %T Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces %A Avik Kar %A Rahul Singh %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-kar25a %I PMLR %P 1924--1964 %U https://proceedings.mlr.press/v286/kar25a.html %V 286 %X We study infinite-horizon average-reward reinforcement learning (RL) for Lipschitz MDPs, a broad class that subsumes several important classes such as linear and RKHS MDPs, function approximation frameworks, and develop an adaptive algorithm $\text{ZoRL}$ with regret bounded as $\mathcal{O}\big(T^{1 - d_{\text{eff.}}^{-1}}\big)$, where $d_{\text{eff.}}= 2d_\mathcal{S} + d_z + 3$, $d_\mathcal{S}$ is the dimension of the state space and $d_z$ is the zooming dimension. In contrast, algorithms with fixed discretization yield $d_{\text{eff.}} = 2(d_\mathcal{S} + d_\mathcal{A}) + 2$, $d_\mathcal{A}$ being the dimension of action space. $\text{ZoRL}$ achieves this by discretizing the state-action space adaptively and zooming into ”promising regions” of the state-action space. $d_z$, a problem-dependent quantity bounded by the state-action space’s dimension, allows us to conclude that if an MDP is benign, then the regret of $\text{ZoRL}$ will be small. The zooming dimension and $\text{ZoRL}$ are truly adaptive, i.e., the current work shows how to capture adaptivity gains for infinite-horizon average-reward RL. $\text{ZoRL}$ outperforms other state-of-the-art algorithms in experiments, thereby demonstrating the gains arising due to adaptivity.
APA
Kar, A. & Singh, R.. (2025). Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:1924-1964 Available from https://proceedings.mlr.press/v286/kar25a.html.

Related Material