Learning to Steer Learners in Games

Yizhou Zhang; Yian Ma; Eric Mazumdar

Learning to Steer Learners in Games

Yizhou Zhang, Yian Ma, Eric Mazumdar

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:75695-75733, 2025.

Abstract

We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner’s objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner’s payoff structure. We demonstrate the effectiveness of this approach if the learner’s algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-zhang25bd,
  title = 	 {Learning to Steer Learners in Games},
  author =       {Zhang, Yizhou and Ma, Yian and Mazumdar, Eric},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {75695--75733},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhang25bd/zhang25bd.pdf},
  url = 	 {https://proceedings.mlr.press/v267/zhang25bd.html},
  abstract = 	 {We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner’s objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner’s payoff structure. We demonstrate the effectiveness of this approach if the learner’s algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.}
}

Endnote

%0 Conference Paper
%T Learning to Steer Learners in Games
%A Yizhou Zhang
%A Yian Ma
%A Eric Mazumdar
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-zhang25bd
%I PMLR
%P 75695--75733
%U https://proceedings.mlr.press/v267/zhang25bd.html
%V 267
%X We consider the problem of learning to exploit learning algorithms through repeated interactions in games. Specifically, we focus on the case of repeated two player, finite-action games, in which an optimizer aims to steer a no-regret learner to a Stackelberg equilibrium without knowledge of its payoffs. We first show that this is impossible if the optimizer only knows that the learner is using an algorithm from the general class of no-regret algorithms. This suggests that the optimizer requires more information about the learner’s objectives or algorithm to successfully exploit them. Building on this intuition, we reduce the problem for the optimizer to that of recovering the learner’s payoff structure. We demonstrate the effectiveness of this approach if the learner’s algorithm is drawn from a smaller class by analyzing two examples: one where the learner uses an ascent algorithm, and another where the learner uses stochastic mirror ascent with known regularizer and step sizes.

APA

Zhang, Y., Ma, Y. & Mazumdar, E.. (2025). Learning to Steer Learners in Games. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:75695-75733 Available from https://proceedings.mlr.press/v267/zhang25bd.html.

Learning to Steer Learners in Games

Abstract

Cite this Paper

Related Material