Computational Intractability of Strategizing against Online Learners

Angelos Assos, Yuval Dagan, Nived Rajaraman
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:169-199, 2025.

Abstract

Online learning algorithms are widely used in strategic multi-agent settings, including repeated auctions, contract design, and pricing competitions, where agents adapt their strategies over time. A key question in such environments is how an optimizing agent can best respond to a learning agent to improve its own long-term outcomes. While prior work has developed efficient algorithms for the optimizer in special cases—such as structured auction settings or contract design—no general efficient algorithm is known. In this paper, we establish a strong computational hardness result: unless $\mathsf{P} = \mathsf{NP}$, no polynomial-time optimizer can compute a near-optimal strategy against a learner using a standard no-regret algorithm, specifically Multiplicative Weights Update (MWU). Our result proves an $\Omega(T)$ hardness bound, significantly strengthening previous work that only showed an additive $\Theta(1)$ impossibility result. Furthermore, while the prior hardness result focused on learners using fictitious play—an algorithm that is not no-regret—we prove intractability for a widely used no-regret learning algorithm. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-assos25a, title = {Computational Intractability of Strategizing against Online Learners}, author = {Assos, Angelos and Dagan, Yuval and Rajaraman, Nived}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {169--199}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/assos25a/assos25a.pdf}, url = {https://proceedings.mlr.press/v291/assos25a.html}, abstract = {Online learning algorithms are widely used in strategic multi-agent settings, including repeated auctions, contract design, and pricing competitions, where agents adapt their strategies over time. A key question in such environments is how an optimizing agent can best respond to a learning agent to improve its own long-term outcomes. While prior work has developed efficient algorithms for the optimizer in special cases—such as structured auction settings or contract design—no general efficient algorithm is known. In this paper, we establish a strong computational hardness result: unless $\mathsf{P} = \mathsf{NP}$, no polynomial-time optimizer can compute a near-optimal strategy against a learner using a standard no-regret algorithm, specifically Multiplicative Weights Update (MWU). Our result proves an $\Omega(T)$ hardness bound, significantly strengthening previous work that only showed an additive $\Theta(1)$ impossibility result. Furthermore, while the prior hardness result focused on learners using fictitious play—an algorithm that is not no-regret—we prove intractability for a widely used no-regret learning algorithm. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings. } }
Endnote
%0 Conference Paper %T Computational Intractability of Strategizing against Online Learners %A Angelos Assos %A Yuval Dagan %A Nived Rajaraman %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-assos25a %I PMLR %P 169--199 %U https://proceedings.mlr.press/v291/assos25a.html %V 291 %X Online learning algorithms are widely used in strategic multi-agent settings, including repeated auctions, contract design, and pricing competitions, where agents adapt their strategies over time. A key question in such environments is how an optimizing agent can best respond to a learning agent to improve its own long-term outcomes. While prior work has developed efficient algorithms for the optimizer in special cases—such as structured auction settings or contract design—no general efficient algorithm is known. In this paper, we establish a strong computational hardness result: unless $\mathsf{P} = \mathsf{NP}$, no polynomial-time optimizer can compute a near-optimal strategy against a learner using a standard no-regret algorithm, specifically Multiplicative Weights Update (MWU). Our result proves an $\Omega(T)$ hardness bound, significantly strengthening previous work that only showed an additive $\Theta(1)$ impossibility result. Furthermore, while the prior hardness result focused on learners using fictitious play—an algorithm that is not no-regret—we prove intractability for a widely used no-regret learning algorithm. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.
APA
Assos, A., Dagan, Y. & Rajaraman, N.. (2025). Computational Intractability of Strategizing against Online Learners. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:169-199 Available from https://proceedings.mlr.press/v291/assos25a.html.

Related Material