Stable-Predictive Optimistic Counterfactual Regret Minimization

Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:1853-1862, 2019.

Abstract

The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice. However, the theoretical rate at which past CFR-based algorithms converge to the Nash equilibrium is on the order of $O(T^{-1/2})$, where $T$ is the number of iterations. In contrast, first-order methods can be used to achieve a $O(T^{-1})$ dependence on iterations, yet these methods have been less successful in practice. In this work we present the first CFR variant that breaks the square-root dependence on iterations. By combining and extending recent advances on predictive and stable regret minimizers for the matrix-game setting we show that it is possible to leverage “optimistic” regret minimizers to achieve a $O(T^{-3/4})$ convergence rate within CFR. This is achieved by introducing a new notion of stable-predictivity, and by setting the stability of each counterfactual regret minimizer relative to its location in the decision tree. Experiments show that this method is faster than the original CFR algorithm, although not as fast as newer variants, in spite of their worst-case $O(T^{-1/2})$ dependence on iterations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-farina19a, title = {Stable-Predictive Optimistic Counterfactual Regret Minimization}, author = {Farina, Gabriele and Kroer, Christian and Brown, Noam and Sandholm, Tuomas}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {1853--1862}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/farina19a/farina19a.pdf}, url = {https://proceedings.mlr.press/v97/farina19a.html}, abstract = {The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice. However, the theoretical rate at which past CFR-based algorithms converge to the Nash equilibrium is on the order of $O(T^{-1/2})$, where $T$ is the number of iterations. In contrast, first-order methods can be used to achieve a $O(T^{-1})$ dependence on iterations, yet these methods have been less successful in practice. In this work we present the first CFR variant that breaks the square-root dependence on iterations. By combining and extending recent advances on predictive and stable regret minimizers for the matrix-game setting we show that it is possible to leverage “optimistic” regret minimizers to achieve a $O(T^{-3/4})$ convergence rate within CFR. This is achieved by introducing a new notion of stable-predictivity, and by setting the stability of each counterfactual regret minimizer relative to its location in the decision tree. Experiments show that this method is faster than the original CFR algorithm, although not as fast as newer variants, in spite of their worst-case $O(T^{-1/2})$ dependence on iterations.} }
Endnote
%0 Conference Paper %T Stable-Predictive Optimistic Counterfactual Regret Minimization %A Gabriele Farina %A Christian Kroer %A Noam Brown %A Tuomas Sandholm %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-farina19a %I PMLR %P 1853--1862 %U https://proceedings.mlr.press/v97/farina19a.html %V 97 %X The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice. However, the theoretical rate at which past CFR-based algorithms converge to the Nash equilibrium is on the order of $O(T^{-1/2})$, where $T$ is the number of iterations. In contrast, first-order methods can be used to achieve a $O(T^{-1})$ dependence on iterations, yet these methods have been less successful in practice. In this work we present the first CFR variant that breaks the square-root dependence on iterations. By combining and extending recent advances on predictive and stable regret minimizers for the matrix-game setting we show that it is possible to leverage “optimistic” regret minimizers to achieve a $O(T^{-3/4})$ convergence rate within CFR. This is achieved by introducing a new notion of stable-predictivity, and by setting the stability of each counterfactual regret minimizer relative to its location in the decision tree. Experiments show that this method is faster than the original CFR algorithm, although not as fast as newer variants, in spite of their worst-case $O(T^{-1/2})$ dependence on iterations.
APA
Farina, G., Kroer, C., Brown, N. & Sandholm, T.. (2019). Stable-Predictive Optimistic Counterfactual Regret Minimization. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1853-1862 Available from https://proceedings.mlr.press/v97/farina19a.html.

Related Material