Adapting to game trees in zero-sum imperfect information games

Côme Fiegel, Pierre Menard, Tadashi Kozuno, Remi Munos, Vianney Perchet, Michal Valko
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10093-10135, 2023.

Abstract

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn ϵ-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound ˜O(H(AX+BY)/ϵ2) on the required number of realizations to learn these strategies with high probability, where H is the length of the game, AX and BY are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs ˜O(H2(AX+BY)/ϵ2) realizations without this requirement by progressively adapting the regularization to the observations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-fiegel23a, title = {Adapting to game trees in zero-sum imperfect information games}, author = {Fiegel, C\^{o}me and Menard, Pierre and Kozuno, Tadashi and Munos, Remi and Perchet, Vianney and Valko, Michal}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {10093--10135}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/fiegel23a/fiegel23a.pdf}, url = {https://proceedings.mlr.press/v202/fiegel23a.html}, abstract = {Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs $\widetilde{\mathcal{O}}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ realizations without this requirement by progressively adapting the regularization to the observations.} }
Endnote
%0 Conference Paper %T Adapting to game trees in zero-sum imperfect information games %A Côme Fiegel %A Pierre Menard %A Tadashi Kozuno %A Remi Munos %A Vianney Perchet %A Michal Valko %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-fiegel23a %I PMLR %P 10093--10135 %U https://proceedings.mlr.press/v202/fiegel23a.html %V 202 %X Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs $\widetilde{\mathcal{O}}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ realizations without this requirement by progressively adapting the regularization to the observations.
APA
Fiegel, C., Menard, P., Kozuno, T., Munos, R., Perchet, V. & Valko, M.. (2023). Adapting to game trees in zero-sum imperfect information games. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10093-10135 Available from https://proceedings.mlr.press/v202/fiegel23a.html.

Related Material