R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

Zhongxiang Dai, Yizhou Chen, Bryan Kian Hsiang Low, Patrick Jaillet, Teck-Hua Ho
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:2291-2301, 2020.

Abstract

This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-dai20a, title = {R2-B2: Recursive Reasoning-Based {B}ayesian Optimization for No-Regret Learning in Games}, author = {Dai, Zhongxiang and Chen, Yizhou and Low, Bryan Kian Hsiang and Jaillet, Patrick and Ho, Teck-Hua}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {2291--2301}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/dai20a/dai20a.pdf}, url = {https://proceedings.mlr.press/v119/dai20a.html}, abstract = {This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.} }
Endnote
%0 Conference Paper %T R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games %A Zhongxiang Dai %A Yizhou Chen %A Bryan Kian Hsiang Low %A Patrick Jaillet %A Teck-Hua Ho %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-dai20a %I PMLR %P 2291--2301 %U https://proceedings.mlr.press/v119/dai20a.html %V 119 %X This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.
APA
Dai, Z., Chen, Y., Low, B.K.H., Jaillet, P. & Ho, T.. (2020). R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:2291-2301 Available from https://proceedings.mlr.press/v119/dai20a.html.

Related Material