[edit]
Bandits with Knapsacks and Predictions
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:1189-1206, 2024.
Abstract
We study the Bandits with Knapsacks problem with the aim of designing a learning-augmented online learning algorithm upholding better regret guarantees than the state-of-the-art primal-dual algorithms with worst-case guarantees, under both stochastic and adversarial inputs. In the adversarial case, we obtain better competitive ratios when the input predictions are accurate, while also maintaining worst-case guarantees for imprecise predictions. We introduce two algorithms tailored for the full and bandit feedback settings, respectively. Both algorithms integrate a static prediction with a worst-case no-$\alpha$-regret algorithm. This yields an optimized competitive ratio of $(\pi + (1 -\pi)/\alpha)^{-1}$ in scenarios where the prediction is perfect, and a competitive ratio of $\alpha/(1 - \pi)$ in the case of highly imprecise predictions, where $\pi \in (0,1)$ is chosen by the learner and $\alpha$ is Slater’s parameter. We complement this analysis by studying the stochastic setting under full feedback. We provide an algorithm which guarantees a pseudo-regret of $\widetilde{O}(\sqrt{T})$ with poor predictions, and 0 pseudo-regret with perfect predictions. We also characterize the smoothness of the algorithm.