A Reduction Framework for Distributionally Robust Reinforcement Learning under Average Reward

Zachary Andrew Roch, George K. Atia, Yue Wang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:51809-51850, 2025.

Abstract

Robust reinforcement learning (RL) under the average reward criterion, which seeks to optimize long-term system performance in uncertain environments, remains a largely unexplored area. To address this challenge, we propose a reduction-based framework that transforms robust average reward optimization into the more extensively studied robust discounted reward optimization by employing a specific discount factor. Our framework provides two key advantages. Data Efficiency: We design a model-based reduction algorithm that achieves near-optimal sample complexity, enabling efficient identification of optimal robust policies; Scalability: By bypassing the inherent challenges of scaling up average reward optimization, our framework facilitates the design of scalable, convergent algorithms for robust average reward optimization leveraging function approximation. Our algorithmic design, supported by theoretical and empirical analyses, provides a concrete solution to robust average reward RL with the first data efficiency and scalability guarantees, highlighting the framework’s potential to optimize long-term performance under model uncertainty in practical problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-roch25a, title = {A Reduction Framework for Distributionally Robust Reinforcement Learning under Average Reward}, author = {Roch, Zachary Andrew and Atia, George K. and Wang, Yue}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {51809--51850}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/roch25a/roch25a.pdf}, url = {https://proceedings.mlr.press/v267/roch25a.html}, abstract = {Robust reinforcement learning (RL) under the average reward criterion, which seeks to optimize long-term system performance in uncertain environments, remains a largely unexplored area. To address this challenge, we propose a reduction-based framework that transforms robust average reward optimization into the more extensively studied robust discounted reward optimization by employing a specific discount factor. Our framework provides two key advantages. Data Efficiency: We design a model-based reduction algorithm that achieves near-optimal sample complexity, enabling efficient identification of optimal robust policies; Scalability: By bypassing the inherent challenges of scaling up average reward optimization, our framework facilitates the design of scalable, convergent algorithms for robust average reward optimization leveraging function approximation. Our algorithmic design, supported by theoretical and empirical analyses, provides a concrete solution to robust average reward RL with the first data efficiency and scalability guarantees, highlighting the framework’s potential to optimize long-term performance under model uncertainty in practical problems.} }
Endnote
%0 Conference Paper %T A Reduction Framework for Distributionally Robust Reinforcement Learning under Average Reward %A Zachary Andrew Roch %A George K. Atia %A Yue Wang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-roch25a %I PMLR %P 51809--51850 %U https://proceedings.mlr.press/v267/roch25a.html %V 267 %X Robust reinforcement learning (RL) under the average reward criterion, which seeks to optimize long-term system performance in uncertain environments, remains a largely unexplored area. To address this challenge, we propose a reduction-based framework that transforms robust average reward optimization into the more extensively studied robust discounted reward optimization by employing a specific discount factor. Our framework provides two key advantages. Data Efficiency: We design a model-based reduction algorithm that achieves near-optimal sample complexity, enabling efficient identification of optimal robust policies; Scalability: By bypassing the inherent challenges of scaling up average reward optimization, our framework facilitates the design of scalable, convergent algorithms for robust average reward optimization leveraging function approximation. Our algorithmic design, supported by theoretical and empirical analyses, provides a concrete solution to robust average reward RL with the first data efficiency and scalability guarantees, highlighting the framework’s potential to optimize long-term performance under model uncertainty in practical problems.
APA
Roch, Z.A., Atia, G.K. & Wang, Y.. (2025). A Reduction Framework for Distributionally Robust Reinforcement Learning under Average Reward. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:51809-51850 Available from https://proceedings.mlr.press/v267/roch25a.html.

Related Material