A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

Mehdi Jafarnia Jahromi, Rahul A Jain, Ashutosh Nayyar
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3880-3888, 2024.

Abstract

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $\tilde\mathcal{O}(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})$ by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in $T$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-jafarnia-jahromi24a, title = {A {B}ayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent}, author = {Jafarnia Jahromi, Mehdi and A Jain, Rahul and Nayyar, Ashutosh}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3880--3888}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/jafarnia-jahromi24a/jafarnia-jahromi24a.pdf}, url = {https://proceedings.mlr.press/v238/jafarnia-jahromi24a.html}, abstract = {In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $\tilde\mathcal{O}(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})$ by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in $T$.} }
Endnote
%0 Conference Paper %T A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent %A Mehdi Jafarnia Jahromi %A Rahul A Jain %A Ashutosh Nayyar %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-jafarnia-jahromi24a %I PMLR %P 3880--3888 %U https://proceedings.mlr.press/v238/jafarnia-jahromi24a.html %V 238 %X In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $\tilde\mathcal{O}(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})$ by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in $T$.
APA
Jafarnia Jahromi, M., A Jain, R. & Nayyar, A.. (2024). A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3880-3888 Available from https://proceedings.mlr.press/v238/jafarnia-jahromi24a.html.

Related Material