Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:814-839, 2023.

Abstract

In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which—unlike prior guarantees—preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-anagnostides23a, title = {Near-Optimal $Φ$-Regret Learning in Extensive-Form Games}, author = {Anagnostides, Ioannis and Farina, Gabriele and Sandholm, Tuomas}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {814--839}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/anagnostides23a/anagnostides23a.pdf}, url = {https://proceedings.mlr.press/v202/anagnostides23a.html}, abstract = {In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which—unlike prior guarantees—preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.} }
Endnote
%0 Conference Paper %T Near-Optimal $Φ$-Regret Learning in Extensive-Form Games %A Ioannis Anagnostides %A Gabriele Farina %A Tuomas Sandholm %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-anagnostides23a %I PMLR %P 814--839 %U https://proceedings.mlr.press/v202/anagnostides23a.html %V 202 %X In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which—unlike prior guarantees—preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.
APA
Anagnostides, I., Farina, G. & Sandholm, T.. (2023). Near-Optimal $Φ$-Regret Learning in Extensive-Form Games. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:814-839 Available from https://proceedings.mlr.press/v202/anagnostides23a.html.

Related Material