Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation

Canzhe Zhao, Shuze Chen, Weiming Liu, Haobo Fu, Qiang Fu, Shuai Li
Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, PMLR 286:5058-5083, 2025.

Abstract

Despite significant advances in learning imperfect information extensive-form games (IIEFGs), most existing theoretical guarantees are limited to IIEFGs in the tabular case. To permit efficient learning of large-scale IIEFGs, we take the first step in studying two-player zero-sum IIEFGs with linear function approximation. In particular, we consider linear IIEFGs in the formulation of partially observable Markov games (POMGs) with linearly parameterized rewards. To address the challenge that the underlying function approximation structure is difficult to directly apply due to the imperfect information of states, we construct the composite "feature vectors" for information set-action pairs. Based on this, we further propose a "least-squares loss estimator", which we call the *fictitious* least-squares loss estimator. Through integrating this estimator with the follow-the-regularized-leader (FTRL) framework, we propose the *fictitious* least-squares follow-the-regularized-leader ($\text{F}^2\text{TRL}$) algorithm, which achieves a provable $\widetilde{\mathcal{O}}(\lambda\sqrt{d H^2 T})$ regret guarantee in the large $T$ regime, where $d$ is the ambient dimension of the feature mapping, $H$ is the horizon length, $\lambda$ is a "balance coefficient" and $T$ is the number of episodes. At the core of the analysis of $\text{F}^2\text{TRL}$ is the leverage of our proposed new "balanced transition" over information set-action space. Additionally, we complement our results with an $\Omega(\sqrt{d\min(d,H)T})$ regret lower bound for this problem and conduct empirical evaluations across various environments, which corroborate the effectiveness of our algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v286-zhao25b, title = {Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation}, author = {Zhao, Canzhe and Chen, Shuze and Liu, Weiming and Fu, Haobo and Fu, Qiang and Li, Shuai}, booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence}, pages = {5058--5083}, year = {2025}, editor = {Chiappa, Silvia and Magliacane, Sara}, volume = {286}, series = {Proceedings of Machine Learning Research}, month = {21--25 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v286/main/assets/zhao25b/zhao25b.pdf}, url = {https://proceedings.mlr.press/v286/zhao25b.html}, abstract = {Despite significant advances in learning imperfect information extensive-form games (IIEFGs), most existing theoretical guarantees are limited to IIEFGs in the tabular case. To permit efficient learning of large-scale IIEFGs, we take the first step in studying two-player zero-sum IIEFGs with linear function approximation. In particular, we consider linear IIEFGs in the formulation of partially observable Markov games (POMGs) with linearly parameterized rewards. To address the challenge that the underlying function approximation structure is difficult to directly apply due to the imperfect information of states, we construct the composite "feature vectors" for information set-action pairs. Based on this, we further propose a "least-squares loss estimator", which we call the *fictitious* least-squares loss estimator. Through integrating this estimator with the follow-the-regularized-leader (FTRL) framework, we propose the *fictitious* least-squares follow-the-regularized-leader ($\text{F}^2\text{TRL}$) algorithm, which achieves a provable $\widetilde{\mathcal{O}}(\lambda\sqrt{d H^2 T})$ regret guarantee in the large $T$ regime, where $d$ is the ambient dimension of the feature mapping, $H$ is the horizon length, $\lambda$ is a "balance coefficient" and $T$ is the number of episodes. At the core of the analysis of $\text{F}^2\text{TRL}$ is the leverage of our proposed new "balanced transition" over information set-action space. Additionally, we complement our results with an $\Omega(\sqrt{d\min(d,H)T})$ regret lower bound for this problem and conduct empirical evaluations across various environments, which corroborate the effectiveness of our algorithm.} }
Endnote
%0 Conference Paper %T Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation %A Canzhe Zhao %A Shuze Chen %A Weiming Liu %A Haobo Fu %A Qiang Fu %A Shuai Li %B Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2025 %E Silvia Chiappa %E Sara Magliacane %F pmlr-v286-zhao25b %I PMLR %P 5058--5083 %U https://proceedings.mlr.press/v286/zhao25b.html %V 286 %X Despite significant advances in learning imperfect information extensive-form games (IIEFGs), most existing theoretical guarantees are limited to IIEFGs in the tabular case. To permit efficient learning of large-scale IIEFGs, we take the first step in studying two-player zero-sum IIEFGs with linear function approximation. In particular, we consider linear IIEFGs in the formulation of partially observable Markov games (POMGs) with linearly parameterized rewards. To address the challenge that the underlying function approximation structure is difficult to directly apply due to the imperfect information of states, we construct the composite "feature vectors" for information set-action pairs. Based on this, we further propose a "least-squares loss estimator", which we call the *fictitious* least-squares loss estimator. Through integrating this estimator with the follow-the-regularized-leader (FTRL) framework, we propose the *fictitious* least-squares follow-the-regularized-leader ($\text{F}^2\text{TRL}$) algorithm, which achieves a provable $\widetilde{\mathcal{O}}(\lambda\sqrt{d H^2 T})$ regret guarantee in the large $T$ regime, where $d$ is the ambient dimension of the feature mapping, $H$ is the horizon length, $\lambda$ is a "balance coefficient" and $T$ is the number of episodes. At the core of the analysis of $\text{F}^2\text{TRL}$ is the leverage of our proposed new "balanced transition" over information set-action space. Additionally, we complement our results with an $\Omega(\sqrt{d\min(d,H)T})$ regret lower bound for this problem and conduct empirical evaluations across various environments, which corroborate the effectiveness of our algorithm.
APA
Zhao, C., Chen, S., Liu, W., Fu, H., Fu, Q. & Li, S.. (2025). Towards Provably Efficient Learning of Imperfect Information Extensive-Form Games with Linear Function Approximation. Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 286:5058-5083 Available from https://proceedings.mlr.press/v286/zhao25b.html.

Related Material