Online Learning in Stackelberg Games with an Omniscient Follower

Geng Zhao; Banghua Zhu; Jiantao Jiao; Michael Jordan

Online Learning in Stackelberg Games with an Omniscient Follower

Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael Jordan

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:42304-42316, 2023.

Abstract

We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. This poses unique challenges for the learning process of the leader and the subsequent regret analysis.

Cite this Paper

BibTeX

@InProceedings{pmlr-v202-zhao23o,
  title = 	 {Online Learning in Stackelberg Games with an Omniscient Follower},
  author =       {Zhao, Geng and Zhu, Banghua and Jiao, Jiantao and Jordan, Michael},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {42304--42316},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/zhao23o/zhao23o.pdf},
  url = 	 {https://proceedings.mlr.press/v202/zhao23o.html},
  abstract = 	 {We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. This poses unique challenges for the learning process of the leader and the subsequent regret analysis.}
}

Endnote

%0 Conference Paper
%T Online Learning in Stackelberg Games with an Omniscient Follower
%A Geng Zhao
%A Banghua Zhu
%A Jiantao Jiao
%A Michael Jordan
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-zhao23o
%I PMLR
%P 42304--42316
%U https://proceedings.mlr.press/v202/zhao23o.html
%V 202
%X We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. This poses unique challenges for the learning process of the leader and the subsequent regret analysis.

APA

Zhao, G., Zhu, B., Jiao, J. & Jordan, M.. (2023). Online Learning in Stackelberg Games with an Omniscient Follower. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:42304-42316 Available from https://proceedings.mlr.press/v202/zhao23o.html.

Online Learning in Stackelberg Games with an Omniscient Follower

Abstract

Cite this Paper

Related Material