Balancing adaptability and non-exploitability in repeated games

Anthony DiGiovanni, Ambuj Tewari
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:559-568, 2022.

Abstract

We study the problem of adaptability in repeated games: simultaneously guaranteeing low regret for several classes of opponents. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some “fair” value. Our solution is an expert algorithm (LAFF), which searches within a set of sub-algorithms that are optimal for each opponent class, and punishes evidence of exploitation by switching to a policy that enforces a fair solution. With benchmarks that depend on the opponent class, we first show that LAFF has sublinear regret uniformly over these classes. Second, we show that LAFF discourages exploitation, because exploitative opponents have linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-digiovanni22a, title = {Balancing adaptability and non-exploitability in repeated games}, author = {DiGiovanni, Anthony and Tewari, Ambuj}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {559--568}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/digiovanni22a/digiovanni22a.pdf}, url = {https://proceedings.mlr.press/v180/digiovanni22a.html}, abstract = {We study the problem of adaptability in repeated games: simultaneously guaranteeing low regret for several classes of opponents. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some “fair” value. Our solution is an expert algorithm (LAFF), which searches within a set of sub-algorithms that are optimal for each opponent class, and punishes evidence of exploitation by switching to a policy that enforces a fair solution. With benchmarks that depend on the opponent class, we first show that LAFF has sublinear regret uniformly over these classes. Second, we show that LAFF discourages exploitation, because exploitative opponents have linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.} }
Endnote
%0 Conference Paper %T Balancing adaptability and non-exploitability in repeated games %A Anthony DiGiovanni %A Ambuj Tewari %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-digiovanni22a %I PMLR %P 559--568 %U https://proceedings.mlr.press/v180/digiovanni22a.html %V 180 %X We study the problem of adaptability in repeated games: simultaneously guaranteeing low regret for several classes of opponents. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some “fair” value. Our solution is an expert algorithm (LAFF), which searches within a set of sub-algorithms that are optimal for each opponent class, and punishes evidence of exploitation by switching to a policy that enforces a fair solution. With benchmarks that depend on the opponent class, we first show that LAFF has sublinear regret uniformly over these classes. Second, we show that LAFF discourages exploitation, because exploitative opponents have linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.
APA
DiGiovanni, A. & Tewari, A.. (2022). Balancing adaptability and non-exploitability in repeated games. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:559-568 Available from https://proceedings.mlr.press/v180/digiovanni22a.html.

Related Material