Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions

Negin Golrezaei, Sourav Sahoo
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19841-19877, 2025.

Abstract

We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a single value-maximizing buyer who aims to maximize their cumulative value over $T$ rounds while adhering to return-on-investment (RoI) constraints in each round. Buyers adopt $m$-uniform bidding format, where they submit $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$. We introduce safe bidding strategies as those that satisfy RoI constraints in every auction, regardless of competing bids. We show that these strategies depend only on the bidder’s valuation curve, and the bidder can focus on a finite subset of this class without loss of generality. While the number of strategies in this subset is exponential in $m$, we develop a polynomial-time algorithm to learn the optimal safe strategy that achieves sublinear regret in the online setting, where regret is measured against a clairvoyant benchmark that knows the competing bids a priori and selects a fixed hindsight optimal safe strategy. We then evaluate the performance of safe strategies against a clairvoyant that selects the optimal strategy from a richer class of strategies in the online setting. In this scenario, we compute the richness ratio, $\alpha\in(0, 1]$ for the class of strategies chosen by the clairvoyant and show that our algorithm, designed to learn safe strategies, achieves $\alpha$-approximate sublinear regret against these stronger benchmarks. Experiments on semi-synthetic data from real-world auctions show that safe strategies substantially outperform the derived theoretical bounds, making them quite appealing in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-golrezaei25a, title = {Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions}, author = {Golrezaei, Negin and Sahoo, Sourav}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {19841--19877}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/golrezaei25a/golrezaei25a.pdf}, url = {https://proceedings.mlr.press/v267/golrezaei25a.html}, abstract = {We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a single value-maximizing buyer who aims to maximize their cumulative value over $T$ rounds while adhering to return-on-investment (RoI) constraints in each round. Buyers adopt $m$-uniform bidding format, where they submit $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$. We introduce safe bidding strategies as those that satisfy RoI constraints in every auction, regardless of competing bids. We show that these strategies depend only on the bidder’s valuation curve, and the bidder can focus on a finite subset of this class without loss of generality. While the number of strategies in this subset is exponential in $m$, we develop a polynomial-time algorithm to learn the optimal safe strategy that achieves sublinear regret in the online setting, where regret is measured against a clairvoyant benchmark that knows the competing bids a priori and selects a fixed hindsight optimal safe strategy. We then evaluate the performance of safe strategies against a clairvoyant that selects the optimal strategy from a richer class of strategies in the online setting. In this scenario, we compute the richness ratio, $\alpha\in(0, 1]$ for the class of strategies chosen by the clairvoyant and show that our algorithm, designed to learn safe strategies, achieves $\alpha$-approximate sublinear regret against these stronger benchmarks. Experiments on semi-synthetic data from real-world auctions show that safe strategies substantially outperform the derived theoretical bounds, making them quite appealing in practice.} }
Endnote
%0 Conference Paper %T Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions %A Negin Golrezaei %A Sourav Sahoo %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-golrezaei25a %I PMLR %P 19841--19877 %U https://proceedings.mlr.press/v267/golrezaei25a.html %V 267 %X We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a single value-maximizing buyer who aims to maximize their cumulative value over $T$ rounds while adhering to return-on-investment (RoI) constraints in each round. Buyers adopt $m$-uniform bidding format, where they submit $m$ bid-quantity pairs $(b_i, q_i)$ to demand $q_i$ units at bid $b_i$. We introduce safe bidding strategies as those that satisfy RoI constraints in every auction, regardless of competing bids. We show that these strategies depend only on the bidder’s valuation curve, and the bidder can focus on a finite subset of this class without loss of generality. While the number of strategies in this subset is exponential in $m$, we develop a polynomial-time algorithm to learn the optimal safe strategy that achieves sublinear regret in the online setting, where regret is measured against a clairvoyant benchmark that knows the competing bids a priori and selects a fixed hindsight optimal safe strategy. We then evaluate the performance of safe strategies against a clairvoyant that selects the optimal strategy from a richer class of strategies in the online setting. In this scenario, we compute the richness ratio, $\alpha\in(0, 1]$ for the class of strategies chosen by the clairvoyant and show that our algorithm, designed to learn safe strategies, achieves $\alpha$-approximate sublinear regret against these stronger benchmarks. Experiments on semi-synthetic data from real-world auctions show that safe strategies substantially outperform the derived theoretical bounds, making them quite appealing in practice.
APA
Golrezaei, N. & Sahoo, S.. (2025). Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:19841-19877 Available from https://proceedings.mlr.press/v267/golrezaei25a.html.

Related Material