Bandits with Knapsacks: Advice on Time-Varying Demands

Lixing Lyu, Wang Chi Cheung
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:23212-23238, 2023.

Abstract

We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at each time is scaled by a non-stationary quantity that signifies changing demand volumes. Instead of studying settings with limited non-stationarity, we investigate how online predictions on the total demand volume $Q$ allows us to improve our performance guarantees. We show that, without any prediction, any online algorithm incurs a linear-in-$T$ regret. In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions. These bounds are shown to be tight in settings when prediction accuracy improves across time. Our theoretical results are corroborated by our numerical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lyu23a, title = {Bandits with Knapsacks: Advice on Time-Varying Demands}, author = {Lyu, Lixing and Cheung, Wang Chi}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {23212--23238}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lyu23a/lyu23a.pdf}, url = {https://proceedings.mlr.press/v202/lyu23a.html}, abstract = {We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at each time is scaled by a non-stationary quantity that signifies changing demand volumes. Instead of studying settings with limited non-stationarity, we investigate how online predictions on the total demand volume $Q$ allows us to improve our performance guarantees. We show that, without any prediction, any online algorithm incurs a linear-in-$T$ regret. In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions. These bounds are shown to be tight in settings when prediction accuracy improves across time. Our theoretical results are corroborated by our numerical findings.} }
Endnote
%0 Conference Paper %T Bandits with Knapsacks: Advice on Time-Varying Demands %A Lixing Lyu %A Wang Chi Cheung %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lyu23a %I PMLR %P 23212--23238 %U https://proceedings.mlr.press/v202/lyu23a.html %V 202 %X We consider a non-stationary Bandits with Knapsack problem. The outcome distribution at each time is scaled by a non-stationary quantity that signifies changing demand volumes. Instead of studying settings with limited non-stationarity, we investigate how online predictions on the total demand volume $Q$ allows us to improve our performance guarantees. We show that, without any prediction, any online algorithm incurs a linear-in-$T$ regret. In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions. These bounds are shown to be tight in settings when prediction accuracy improves across time. Our theoretical results are corroborated by our numerical findings.
APA
Lyu, L. & Cheung, W.C.. (2023). Bandits with Knapsacks: Advice on Time-Varying Demands. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:23212-23238 Available from https://proceedings.mlr.press/v202/lyu23a.html.

Related Material