High-dimensional Linear Bandits with Knapsacks

Wanteng Ma, Dong Xia, Jiashuo Jiang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:34008-34037, 2024.

Abstract

We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large. We investigate how to exploit the sparsity structure to achieve improved regret for the CBwK problem. To this end, we first develop an online variant of the hard thresholding algorithm that performs the optimal sparse estimation. We further combine our online estimator with a primal-dual framework, where we assign a dual variable to each knapsack constraint and utilize an online learning algorithm to update the dual variable, thereby controlling the consumption of the knapsack capacity. We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension, thus improving the polynomial dependency established in the previous literature. We also apply our framework to the high-dimension contextual bandit problem without the knapsack constraint and achieve optimal regret in both the data-poor regime and the data-rich regime.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ma24p, title = {High-dimensional Linear Bandits with Knapsacks}, author = {Ma, Wanteng and Xia, Dong and Jiang, Jiashuo}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {34008--34037}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ma24p/ma24p.pdf}, url = {https://proceedings.mlr.press/v235/ma24p.html}, abstract = {We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large. We investigate how to exploit the sparsity structure to achieve improved regret for the CBwK problem. To this end, we first develop an online variant of the hard thresholding algorithm that performs the optimal sparse estimation. We further combine our online estimator with a primal-dual framework, where we assign a dual variable to each knapsack constraint and utilize an online learning algorithm to update the dual variable, thereby controlling the consumption of the knapsack capacity. We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension, thus improving the polynomial dependency established in the previous literature. We also apply our framework to the high-dimension contextual bandit problem without the knapsack constraint and achieve optimal regret in both the data-poor regime and the data-rich regime.} }
Endnote
%0 Conference Paper %T High-dimensional Linear Bandits with Knapsacks %A Wanteng Ma %A Dong Xia %A Jiashuo Jiang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ma24p %I PMLR %P 34008--34037 %U https://proceedings.mlr.press/v235/ma24p.html %V 235 %X We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large. We investigate how to exploit the sparsity structure to achieve improved regret for the CBwK problem. To this end, we first develop an online variant of the hard thresholding algorithm that performs the optimal sparse estimation. We further combine our online estimator with a primal-dual framework, where we assign a dual variable to each knapsack constraint and utilize an online learning algorithm to update the dual variable, thereby controlling the consumption of the knapsack capacity. We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension, thus improving the polynomial dependency established in the previous literature. We also apply our framework to the high-dimension contextual bandit problem without the knapsack constraint and achieve optimal regret in both the data-poor regime and the data-rich regime.
APA
Ma, W., Xia, D. & Jiang, J.. (2024). High-dimensional Linear Bandits with Knapsacks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:34008-34037 Available from https://proceedings.mlr.press/v235/ma24p.html.

Related Material