A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs

Kihyuk Hong, Ambuj Tewari
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:23751-23773, 2025.

Abstract

We study reinforcement learning in infinite-horizon average-reward settings with linear MDPs. Previous work addresses this problem by approximating the average-reward setting by discounted setting and employing a value iteration-based algorithm that uses clipping to constrain the span of the value function for improved statistical efficiency. However, the clipping procedure requires computing the minimum of the value function over the entire state space, which is prohibitive since the state space in linear MDP setting can be large or even infinite. In this paper, we introduce a value iteration method with efficient clipping operation that only requires computing the minimum of value functions over the set of states visited by the algorithm. Our algorithm enjoys the same regret bound as the previous work while being computationally efficient, with computational complexity that is independent of the size of the state space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-hong25g, title = {A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear {MDP}s}, author = {Hong, Kihyuk and Tewari, Ambuj}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {23751--23773}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hong25g/hong25g.pdf}, url = {https://proceedings.mlr.press/v267/hong25g.html}, abstract = {We study reinforcement learning in infinite-horizon average-reward settings with linear MDPs. Previous work addresses this problem by approximating the average-reward setting by discounted setting and employing a value iteration-based algorithm that uses clipping to constrain the span of the value function for improved statistical efficiency. However, the clipping procedure requires computing the minimum of the value function over the entire state space, which is prohibitive since the state space in linear MDP setting can be large or even infinite. In this paper, we introduce a value iteration method with efficient clipping operation that only requires computing the minimum of value functions over the set of states visited by the algorithm. Our algorithm enjoys the same regret bound as the previous work while being computationally efficient, with computational complexity that is independent of the size of the state space.} }
Endnote
%0 Conference Paper %T A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs %A Kihyuk Hong %A Ambuj Tewari %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-hong25g %I PMLR %P 23751--23773 %U https://proceedings.mlr.press/v267/hong25g.html %V 267 %X We study reinforcement learning in infinite-horizon average-reward settings with linear MDPs. Previous work addresses this problem by approximating the average-reward setting by discounted setting and employing a value iteration-based algorithm that uses clipping to constrain the span of the value function for improved statistical efficiency. However, the clipping procedure requires computing the minimum of the value function over the entire state space, which is prohibitive since the state space in linear MDP setting can be large or even infinite. In this paper, we introduce a value iteration method with efficient clipping operation that only requires computing the minimum of value functions over the set of states visited by the algorithm. Our algorithm enjoys the same regret bound as the previous work while being computationally efficient, with computational complexity that is independent of the size of the state space.
APA
Hong, K. & Tewari, A.. (2025). A Computationally Efficient Algorithm for Infinite-Horizon Average-Reward Linear MDPs. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:23751-23773 Available from https://proceedings.mlr.press/v267/hong25g.html.

Related Material