[edit]
Balancing adaptability and non-exploitability in repeated games
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:559-568, 2022.
Abstract
We study the problem of adaptability in repeated games: simultaneously guaranteeing low regret for several classes of opponents. We add the constraint that our algorithm is non-exploitable, in that the opponent lacks an incentive to use an algorithm against which we cannot achieve rewards exceeding some “fair” value. Our solution is an expert algorithm (LAFF), which searches within a set of sub-algorithms that are optimal for each opponent class, and punishes evidence of exploitation by switching to a policy that enforces a fair solution. With benchmarks that depend on the opponent class, we first show that LAFF has sublinear regret uniformly over these classes. Second, we show that LAFF discourages exploitation, because exploitative opponents have linear regret. To our knowledge, this work is the first to provide guarantees for both regret and non-exploitability in multi-agent learning.