Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

Yu-Guan Hsieh, Kimon Antonakopoulos, Panayotis Mertikopoulos
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:2388-2422, 2021.

Abstract

In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player’s perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: (\emph{i}) they do not require \emph{any} prior tuning or knowledge of the game; (\emph{ii}) they all achieve $\mathcal{O}(\sqrt{T})$ regret against arbitrary, adversarial opponents; and (\emph{iii}) they converge to the best response against convergent opponents. Also, if employed by all players, then (\emph{iv}) they guarantee $\mathcal{O}(1)$ \emph{social} regret; while (\emph{v}) the induced sequence of play converges to Nash equilibirum with $\mathcal{O}(1)$ \emph{individual} regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games).

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-hsieh21a, title = {Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium}, author = {Hsieh, Yu-Guan and Antonakopoulos, Kimon and Mertikopoulos, Panayotis}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {2388--2422}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/hsieh21a/hsieh21a.pdf}, url = {https://proceedings.mlr.press/v134/hsieh21a.html}, abstract = {In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player’s perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: (\emph{i}) they do not require \emph{any} prior tuning or knowledge of the game; (\emph{ii}) they all achieve $\mathcal{O}(\sqrt{T})$ regret against arbitrary, adversarial opponents; and (\emph{iii}) they converge to the best response against convergent opponents. Also, if employed by all players, then (\emph{iv}) they guarantee $\mathcal{O}(1)$ \emph{social} regret; while (\emph{v}) the induced sequence of play converges to Nash equilibirum with $\mathcal{O}(1)$ \emph{individual} regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games).} }
Endnote
%0 Conference Paper %T Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium %A Yu-Guan Hsieh %A Kimon Antonakopoulos %A Panayotis Mertikopoulos %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-hsieh21a %I PMLR %P 2388--2422 %U https://proceedings.mlr.press/v134/hsieh21a.html %V 134 %X In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player’s perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: (\emph{i}) they do not require \emph{any} prior tuning or knowledge of the game; (\emph{ii}) they all achieve $\mathcal{O}(\sqrt{T})$ regret against arbitrary, adversarial opponents; and (\emph{iii}) they converge to the best response against convergent opponents. Also, if employed by all players, then (\emph{iv}) they guarantee $\mathcal{O}(1)$ \emph{social} regret; while (\emph{v}) the induced sequence of play converges to Nash equilibirum with $\mathcal{O}(1)$ \emph{individual} regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games).
APA
Hsieh, Y., Antonakopoulos, K. & Mertikopoulos, P.. (2021). Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:2388-2422 Available from https://proceedings.mlr.press/v134/hsieh21a.html.

Related Material