Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles

Yuxuan Han, Jialin Zeng, Yang Wang, Yang Xiang, Jiheng Zhang
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:5011-5035, 2023.

Abstract

We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-han23b, title = {Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles}, author = {Han, Yuxuan and Zeng, Jialin and Wang, Yang and Xiang, Yang and Zhang, Jiheng}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {5011--5035}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/han23b/han23b.pdf}, url = {https://proceedings.mlr.press/v206/han23b.html}, abstract = {We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes.} }
Endnote
%0 Conference Paper %T Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles %A Yuxuan Han %A Jialin Zeng %A Yang Wang %A Yang Xiang %A Jiheng Zhang %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-han23b %I PMLR %P 5011--5035 %U https://proceedings.mlr.press/v206/han23b.html %V 206 %X We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form. The challenge is to maximize the total reward without violating the budget for each resource. We study this problem under a general realizability setting where the expected reward and expected cost are functions of contexts and actions in some given general function classes $\mathcal{F}$ and $\mathcal{G}$, respectively. Existing works on CBwK are restricted to the linear function class since they use UCB-type algorithms, which heavily rely on the linear form and thus are difficult to extend to general function classes. Motivated by online regression oracles that have been successfully applied to contextual bandits, we propose the first universal and optimal algorithmic framework for CBwK by reducing it to online regression. We also establish the lower regret bound to show the optimality of our algorithm for a variety of function classes.
APA
Han, Y., Zeng, J., Wang, Y., Xiang, Y. & Zhang, J.. (2023). Optimal Contextual Bandits with Knapsacks under Realizability via Regression Oracles. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:5011-5035 Available from https://proceedings.mlr.press/v206/han23b.html.

Related Material