Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, Xingyu Zhou
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6527-6570, 2023.

Abstract

We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov decision processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, the reward, utility functions, and the transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing with the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-wei23b, title = {Provably Efficient Model-Free Algorithms for Non-stationary CMDPs}, author = {Wei, Honghao and Ghosh, Arnob and Shroff, Ness and Ying, Lei and Zhou, Xingyu}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {6527--6570}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/wei23b/wei23b.pdf}, url = {https://proceedings.mlr.press/v206/wei23b.html}, abstract = {We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov decision processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, the reward, utility functions, and the transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing with the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.} }
Endnote
%0 Conference Paper %T Provably Efficient Model-Free Algorithms for Non-stationary CMDPs %A Honghao Wei %A Arnob Ghosh %A Ness Shroff %A Lei Ying %A Xingyu Zhou %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-wei23b %I PMLR %P 6527--6570 %U https://proceedings.mlr.press/v206/wei23b.html %V 206 %X We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov decision processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, the reward, utility functions, and the transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing with the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.
APA
Wei, H., Ghosh, A., Shroff, N., Ying, L. & Zhou, X.. (2023). Provably Efficient Model-Free Algorithms for Non-stationary CMDPs. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:6527-6570 Available from https://proceedings.mlr.press/v206/wei23b.html.

Related Material