Bandit Regret Scaling with the Effective Loss Range

Nicolò Cesa-Bianchi; Ohad Shamir

Bandit Regret Scaling with the Effective Loss Range

Nicolò Cesa-Bianchi, Ohad Shamir

Proceedings of Algorithmic Learning Theory, PMLR 83:128-151, 2018.

Abstract

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (for example, the maximal difference between two losses or in a given round). Despite a recent impossibility result, we show how this can be made possible under certain mild additional assumptions, such as availability of rough estimates of the losses, or knowledge of the loss of a single, possibly unspecified arm, at the end of each round. Along the way, we develop a novel technique which might be of independent interest, to convert any multi-armed bandit algorithm with regret depending on the loss range, to an algorithm with regret depending only on the effective range, while attaining better regret bounds than existing approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v83-cesa-bianchi18a,
  title = 	 {Bandit Regret Scaling with the Effective Loss Range},
  author = 	 {Cesa-Bianchi, Nicolò and Shamir, Ohad},
  booktitle = 	 {Proceedings of Algorithmic Learning Theory},
  pages = 	 {128--151},
  year = 	 {2018},
  editor = 	 {Janoos, Firdaus and Mohri, Mehryar and Sridharan, Karthik},
  volume = 	 {83},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {07--09 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v83/cesa-bianchi18a/cesa-bianchi18a.pdf},
  url = 	 {https://proceedings.mlr.press/v83/cesa-bianchi18a.html},
  abstract = 	 {We study how the regret guarantees of nonstochastic multi-armed 
 bandits can be improved, if the effective range of the losses in each round is 
 small (for example, the maximal difference between two losses or in a given 
 round). Despite a recent impossibility result, we show how this can be made 
 possible under certain mild additional assumptions, such as availability of 
 rough estimates of the losses, or knowledge of the loss of a single, possibly 
 unspecified arm, at the end of each round. Along the way, we develop a novel 
 technique which might be of independent interest, to convert any multi-armed 
 bandit algorithm with regret depending on the loss range, to an algorithm with 
 regret depending only on the effective range, while attaining better regret 
 bounds than existing approaches.}
}

Endnote

%0 Conference Paper
%T Bandit Regret Scaling with the Effective Loss Range
%A Nicolò Cesa-Bianchi
%A Ohad Shamir
%B Proceedings of Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2018
%E Firdaus Janoos
%E Mehryar Mohri
%E Karthik Sridharan	
%F pmlr-v83-cesa-bianchi18a
%I PMLR
%P 128--151
%U https://proceedings.mlr.press/v83/cesa-bianchi18a.html
%V 83
%X We study how the regret guarantees of nonstochastic multi-armed 
 bandits can be improved, if the effective range of the losses in each round is 
 small (for example, the maximal difference between two losses or in a given 
 round). Despite a recent impossibility result, we show how this can be made 
 possible under certain mild additional assumptions, such as availability of 
 rough estimates of the losses, or knowledge of the loss of a single, possibly 
 unspecified arm, at the end of each round. Along the way, we develop a novel 
 technique which might be of independent interest, to convert any multi-armed 
 bandit algorithm with regret depending on the loss range, to an algorithm with 
 regret depending only on the effective range, while attaining better regret 
 bounds than existing approaches.

APA


Cesa-Bianchi, N. & Shamir, O.. (2018). Bandit Regret Scaling with the Effective Loss Range. Proceedings of Algorithmic Learning Theory, in Proceedings of Machine Learning Research 83:128-151 Available from https://proceedings.mlr.press/v83/cesa-bianchi18a.html.

Related Material

Download PDF