Allocating Divisible Resources on Arms with Unknown and Random Rewards

Wenhao Li, Ningyuan Chen
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:2350-2351, 2023.

Abstract

We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. When the order ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-li23a, title = {Allocating Divisible Resources on Arms with Unknown and Random Rewards}, author = {Li, Wenhao and Chen, Ningyuan}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {2350--2351}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/li23a/li23a.pdf}, url = {https://proceedings.mlr.press/v195/li23a.html}, abstract = {We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. When the order ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.} }
Endnote
%0 Conference Paper %T Allocating Divisible Resources on Arms with Unknown and Random Rewards %A Wenhao Li %A Ningyuan Chen %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-li23a %I PMLR %P 2350--2351 %U https://proceedings.mlr.press/v195/li23a.html %V 195 %X We consider a decision maker allocating one unit of renewable and divisible resource in each period on a number of arms. The arms have unknown and random rewards whose means are proportional to the allocated resource and whose variances are proportional to an order $b$ of the allocated resource. When the order ranges from 0 to 1, the framework smoothly bridges the standard stochastic multi-armed bandit and online learning with full feedback. We design two algorithms that attain the optimal gap-dependent and gap-independent regret bounds for $b\in [0,1]$, and demonstrate a phase transition at $b=1/2$. The theoretical results hinge on a novel concentration inequality we have developed that bounds a linear combination of sub-Gaussian random variables whose weights are fractional, adapted to the filtration, and monotonic.
APA
Li, W. & Chen, N.. (2023). Allocating Divisible Resources on Arms with Unknown and Random Rewards. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:2350-2351 Available from https://proceedings.mlr.press/v195/li23a.html.

Related Material