No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization

Martino Bernasconi, Matteo Castiglioni, Andrea Celli
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:3877-3898, 2025.

Abstract

In the bandits with knapsacks framework (BwK) the learner has $m$ resource-consumption (i.e., packing) constraints. We focus on the generalization of BwK in which the learner has a set of general long-term constraints. The goal of the learner is to maximize their cumulative reward, while at the same time achieving small cumulative constraints violations. In this scenario, there exist simple instances where conventional methods for BwK fail to yield sublinear violations of constraints. We show that it is possible to circumvent this issue by requiring the primal and dual algorithm to be weakly adaptive. Indeed, even without any information on the Slater’s parameter $\rho$ characterizing the problem, the interaction between weakly adaptive primal and dual regret minimizers leads to a “self-bounding” behavior of dual variables. In particular, their norm remains suitably upper bounded across the entire time horizon even without explicit projection steps. By exploiting this property, we provide best-of-both-worlds guarantees for stochastic and adversarial inputs. In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $\rho/(1+\rho)$. In both settings, constraints violations are guaranteed to be sublinear in time. Finally, this results allow us to obtain new result for the problem of contextual bandits with linear constraints, providing the first no-$\alpha$-regret guarantees for adversarial contexts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-bernasconi25a, title = {No-Regret is not enough! {B}andits with General Constraints through Adaptive Regret Minimization}, author = {Bernasconi, Martino and Castiglioni, Matteo and Celli, Andrea}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {3877--3898}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/bernasconi25a/bernasconi25a.pdf}, url = {https://proceedings.mlr.press/v267/bernasconi25a.html}, abstract = {In the bandits with knapsacks framework (BwK) the learner has $m$ resource-consumption (i.e., packing) constraints. We focus on the generalization of BwK in which the learner has a set of general long-term constraints. The goal of the learner is to maximize their cumulative reward, while at the same time achieving small cumulative constraints violations. In this scenario, there exist simple instances where conventional methods for BwK fail to yield sublinear violations of constraints. We show that it is possible to circumvent this issue by requiring the primal and dual algorithm to be weakly adaptive. Indeed, even without any information on the Slater’s parameter $\rho$ characterizing the problem, the interaction between weakly adaptive primal and dual regret minimizers leads to a “self-bounding” behavior of dual variables. In particular, their norm remains suitably upper bounded across the entire time horizon even without explicit projection steps. By exploiting this property, we provide best-of-both-worlds guarantees for stochastic and adversarial inputs. In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $\rho/(1+\rho)$. In both settings, constraints violations are guaranteed to be sublinear in time. Finally, this results allow us to obtain new result for the problem of contextual bandits with linear constraints, providing the first no-$\alpha$-regret guarantees for adversarial contexts.} }
Endnote
%0 Conference Paper %T No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization %A Martino Bernasconi %A Matteo Castiglioni %A Andrea Celli %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-bernasconi25a %I PMLR %P 3877--3898 %U https://proceedings.mlr.press/v267/bernasconi25a.html %V 267 %X In the bandits with knapsacks framework (BwK) the learner has $m$ resource-consumption (i.e., packing) constraints. We focus on the generalization of BwK in which the learner has a set of general long-term constraints. The goal of the learner is to maximize their cumulative reward, while at the same time achieving small cumulative constraints violations. In this scenario, there exist simple instances where conventional methods for BwK fail to yield sublinear violations of constraints. We show that it is possible to circumvent this issue by requiring the primal and dual algorithm to be weakly adaptive. Indeed, even without any information on the Slater’s parameter $\rho$ characterizing the problem, the interaction between weakly adaptive primal and dual regret minimizers leads to a “self-bounding” behavior of dual variables. In particular, their norm remains suitably upper bounded across the entire time horizon even without explicit projection steps. By exploiting this property, we provide best-of-both-worlds guarantees for stochastic and adversarial inputs. In the first case, we show that the algorithm guarantees sublinear regret. In the latter case, we establish a tight competitive ratio of $\rho/(1+\rho)$. In both settings, constraints violations are guaranteed to be sublinear in time. Finally, this results allow us to obtain new result for the problem of contextual bandits with linear constraints, providing the first no-$\alpha$-regret guarantees for adversarial contexts.
APA
Bernasconi, M., Castiglioni, M. & Celli, A.. (2025). No-Regret is not enough! Bandits with General Constraints through Adaptive Regret Minimization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:3877-3898 Available from https://proceedings.mlr.press/v267/bernasconi25a.html.

Related Material