Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals

Junyan Liu, Arnab Maiti, Artin Tajdini, Kevin Jamieson, Lillian J. Ratliff
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:39085-39105, 2025.

Abstract

We initiate the study of a repeated principal-agent problem over a finite horizon $T$, where a principal sequentially interacts with $K\geq 2$ types of agents arriving in an adversarial order. At each round, the principal strategically chooses one of the $N$ arms to incentivize for an arriving agent of unknown type. The agent then chooses an arm based on its own utility and the provided incentive, and the principal receives a corresponding reward. The objective is to minimize regret against the best incentive in hindsight. Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, the principal knows the arm each agent type would select greedily for any given incentive. Under this setting, we propose an algorithm that achieves a regret bound of $\mathcal{O}(\min(\sqrt{KT\log N},K\sqrt{T}))$ and provide a matching lower bound up to a $\log K$ factor. In the second setting, an agent’s response varies smoothly with the incentive and is governed by a Lipschitz constant $L$. Under this setting, we show that there is an algorithm with a regret bound of $\tilde{\mathcal{O}}((LN)^{1/3}T^{2/3})$ and establish a matching lower bound up to logarithmic factors. Finally, we extend our algorithmic results for both settings by allowing the principal to incentivize multiple arms simultaneously in each round.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25at, title = {Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals}, author = {Liu, Junyan and Maiti, Arnab and Tajdini, Artin and Jamieson, Kevin and Ratliff, Lillian J.}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {39085--39105}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25at/liu25at.pdf}, url = {https://proceedings.mlr.press/v267/liu25at.html}, abstract = {We initiate the study of a repeated principal-agent problem over a finite horizon $T$, where a principal sequentially interacts with $K\geq 2$ types of agents arriving in an adversarial order. At each round, the principal strategically chooses one of the $N$ arms to incentivize for an arriving agent of unknown type. The agent then chooses an arm based on its own utility and the provided incentive, and the principal receives a corresponding reward. The objective is to minimize regret against the best incentive in hindsight. Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, the principal knows the arm each agent type would select greedily for any given incentive. Under this setting, we propose an algorithm that achieves a regret bound of $\mathcal{O}(\min(\sqrt{KT\log N},K\sqrt{T}))$ and provide a matching lower bound up to a $\log K$ factor. In the second setting, an agent’s response varies smoothly with the incentive and is governed by a Lipschitz constant $L$. Under this setting, we show that there is an algorithm with a regret bound of $\tilde{\mathcal{O}}((LN)^{1/3}T^{2/3})$ and establish a matching lower bound up to logarithmic factors. Finally, we extend our algorithmic results for both settings by allowing the principal to incentivize multiple arms simultaneously in each round.} }
Endnote
%0 Conference Paper %T Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals %A Junyan Liu %A Arnab Maiti %A Artin Tajdini %A Kevin Jamieson %A Lillian J. Ratliff %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25at %I PMLR %P 39085--39105 %U https://proceedings.mlr.press/v267/liu25at.html %V 267 %X We initiate the study of a repeated principal-agent problem over a finite horizon $T$, where a principal sequentially interacts with $K\geq 2$ types of agents arriving in an adversarial order. At each round, the principal strategically chooses one of the $N$ arms to incentivize for an arriving agent of unknown type. The agent then chooses an arm based on its own utility and the provided incentive, and the principal receives a corresponding reward. The objective is to minimize regret against the best incentive in hindsight. Without prior knowledge of agent behavior, we show that the problem becomes intractable, leading to linear regret. We analyze two key settings where sublinear regret is achievable. In the first setting, the principal knows the arm each agent type would select greedily for any given incentive. Under this setting, we propose an algorithm that achieves a regret bound of $\mathcal{O}(\min(\sqrt{KT\log N},K\sqrt{T}))$ and provide a matching lower bound up to a $\log K$ factor. In the second setting, an agent’s response varies smoothly with the incentive and is governed by a Lipschitz constant $L$. Under this setting, we show that there is an algorithm with a regret bound of $\tilde{\mathcal{O}}((LN)^{1/3}T^{2/3})$ and establish a matching lower bound up to logarithmic factors. Finally, we extend our algorithmic results for both settings by allowing the principal to incentivize multiple arms simultaneously in each round.
APA
Liu, J., Maiti, A., Tajdini, A., Jamieson, K. & Ratliff, L.J.. (2025). Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:39085-39105 Available from https://proceedings.mlr.press/v267/liu25at.html.

Related Material