A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

Mehdi Jafarnia Jahromi; Rahul A Jain; Ashutosh Nayyar

A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent

Mehdi Jafarnia Jahromi, Rahul A Jain, Ashutosh Nayyar

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3880-3888, 2024.

Abstract

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $\tilde\mathcal{O}(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})$ by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in $T$.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-jafarnia-jahromi24a,
  title = 	 {A {B}ayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent},
  author =       {Jafarnia Jahromi, Mehdi and A Jain, Rahul and Nayyar, Ashutosh},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3880--3888},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/jafarnia-jahromi24a/jafarnia-jahromi24a.pdf},
  url = 	 {https://proceedings.mlr.press/v238/jafarnia-jahromi24a.html},
  abstract = 	 {In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $\tilde\mathcal{O}(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})$ by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in $T$.}
}

Endnote

%0 Conference Paper
%T A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent
%A Mehdi Jafarnia Jahromi
%A Rahul A Jain
%A Ashutosh Nayyar
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-jafarnia-jahromi24a
%I PMLR
%P 3880--3888
%U https://proceedings.mlr.press/v238/jafarnia-jahromi24a.html
%V 238
%X In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $\tilde\mathcal{O}(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $\tilde\mathcal{O}(\sqrt[3]{DS^2AT^2})$ by Wei et al., (2017) under the same assumption and matches the theoretical lower bound in $T$.

APA

Jafarnia Jahromi, M., A Jain, R. & Nayyar, A.. (2024). A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3880-3888 Available from https://proceedings.mlr.press/v238/jafarnia-jahromi24a.html.

Related Material

Download PDF