[edit]
Bandits with Knapsacks and Predictions
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1189-1206, 2024.
Abstract
We study the Bandits with Knapsacks problem with the aim of designing a learning-augmented online learning algorithm upholding better regret guarantees than the state-of-the-art primal-dual algorithms with worst-case guarantees, under both stochastic and adversarial inputs. In the adversarial case, we obtain better competitive ratios when the input predictions are accurate, while also maintaining worst-case guarantees for imprecise predictions. We introduce two algorithms tailored for the full and bandit feedback settings, respectively. Both algorithms integrate a static prediction with a worst-case no-α-regret algorithm. This yields an optimized competitive ratio of (π+(1−π)/α)−1 in scenarios where the prediction is perfect, and a competitive ratio of α/(1−π) in the case of highly imprecise predictions, where π∈(0,1) is chosen by the learner and α is Slater’s parameter. We complement this analysis by studying the stochastic setting under full feedback. We provide an algorithm which guarantees a pseudo-regret of ˜O(√T) with poor predictions, and 0 pseudo-regret with perfect predictions. We also characterize the smoothness of the algorithm.