A Better Resource Allocation Algorithm with Semi-Bandit Feedback

Yuval Dagan, Crammer Koby
Proceedings of Algorithmic Learning Theory, PMLR 83:268-320, 2018.

Abstract

We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v83-dagan18a, title = {A Better Resource Allocation Algorithm with Semi-Bandit Feedback}, author = {Dagan, Yuval and Koby, Crammer}, booktitle = {Proceedings of Algorithmic Learning Theory}, pages = {268--320}, year = {2018}, editor = {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik}, volume = {83}, series = {Proceedings of Machine Learning Research}, month = {07--09 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v83/dagan18a/dagan18a.pdf}, url = {https://proceedings.mlr.press/v83/dagan18a.html}, abstract = {We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.} }
Endnote
%0 Conference Paper %T A Better Resource Allocation Algorithm with Semi-Bandit Feedback %A Yuval Dagan %A Crammer Koby %B Proceedings of Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2018 %E Firdaus Janoos %E Mehryar Mohri %E Karthik Sridharan %F pmlr-v83-dagan18a %I PMLR %P 268--320 %U https://proceedings.mlr.press/v83/dagan18a.html %V 83 %X We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.
APA
Dagan, Y. & Koby, C.. (2018). A Better Resource Allocation Algorithm with Semi-Bandit Feedback. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:268-320 Available from https://proceedings.mlr.press/v83/dagan18a.html.

Related Material