A Better Resource Allocation Algorithm with Semi-Bandit Feedback

Yuval Dagan; Crammer Koby

A Better Resource Allocation Algorithm with Semi-Bandit Feedback

Yuval Dagan, Crammer Koby

Proceedings of Algorithmic Learning Theory, PMLR 83:268-320, 2018.

Abstract

We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.

Cite this Paper

BibTeX

@InProceedings{pmlr-v83-dagan18a,
  title = 	 {A Better Resource Allocation Algorithm with Semi-Bandit Feedback},
  author = 	 {Dagan, Yuval and Koby, Crammer},
  booktitle = 	 {Proceedings of Algorithmic Learning Theory},
  pages = 	 {268--320},
  year = 	 {2018},
  editor = 	 {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik},
  volume = 	 {83},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--09 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v83/dagan18a/dagan18a.pdf},
  url = 	 {https://proceedings.mlr.press/v83/dagan18a.html},
  abstract = 	 {We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.}
}

Endnote

%0 Conference Paper
%T A Better Resource Allocation Algorithm with Semi-Bandit Feedback
%A Yuval Dagan
%A Crammer Koby
%B Proceedings of Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Firdaus Janoos
%E Mehryar Mohri
%E Karthik Sridharan	
%F pmlr-v83-dagan18a
%I PMLR
%P 268--320
%U https://proceedings.mlr.press/v83/dagan18a.html
%V 83
%X We study a sequential resource allocation problem between a fixed number of arms. On each iteration the algorithm distributes a resource among the arms in order to maximize the expected success rate. Allocating more of the resource to a given arm increases the probability that it succeeds, yet with a cut-off. We follow Lattimore et al (2014) and assume that the probability increases linearly until it equals one, after which allocating more of the resource is wasteful. These cut-off values are fixed and unknown to the learner. We present an algorithm for this problem and prove a regret upper bound of $O(\log n)$ improving over the best known bound of $O(\log^2 n)$. Lower bounds we prove show that our upper bound is tight. Simulations demonstrate the superiority of our algorithm.

APA

Dagan, Y. & Koby, C.. (2018). A Better Resource Allocation Algorithm with Semi-Bandit Feedback. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:268-320 Available from https://proceedings.mlr.press/v83/dagan18a.html.

Related Material

Download PDF