Prediction with Limited Advice and Multiarmed Bandits with Paid Observations

Yevgeny Seldin; Peter Bartlett; Koby Crammer; Yasin Abbasi-Yadkori

Prediction with Limited Advice and Multiarmed Bandits with Paid Observations

Yevgeny Seldin, Peter Bartlett, Koby Crammer, Yasin Abbasi-Yadkori

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):280-287, 2014.

Abstract

We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-seldin14,
  title = 	 {Prediction with Limited Advice and Multiarmed Bandits with Paid Observations},
  author = 	 {Seldin, Yevgeny and Bartlett, Peter and Crammer, Koby and Abbasi-Yadkori, Yasin},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {280--287},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {1},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/seldin14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/seldin14.html},
  abstract = 	 {We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid  observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.}
}

Endnote

%0 Conference Paper
%T Prediction with Limited Advice and Multiarmed Bandits with Paid Observations
%A Yevgeny Seldin
%A Peter Bartlett
%A Koby Crammer
%A Yasin Abbasi-Yadkori
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-seldin14
%I PMLR
%P 280--287
%U https://proceedings.mlr.press/v32/seldin14.html
%V 32
%N 1
%X We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid  observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.

RIS


TY  - CPAPER
TI  - Prediction with Limited Advice and Multiarmed Bandits with Paid Observations
AU  - Yevgeny Seldin
AU  - Peter Bartlett
AU  - Koby Crammer
AU  - Yasin Abbasi-Yadkori
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/01/27
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-seldin14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 1
SP  - 280
EP  - 287
L1  - http://proceedings.mlr.press/v32/seldin14.pdf
UR  - https://proceedings.mlr.press/v32/seldin14.html
AB  - We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid  observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.
ER  -

APA


Seldin, Y., Bartlett, P., Crammer, K. & Abbasi-Yadkori, Y.. (2014). Prediction with Limited Advice and Multiarmed Bandits with Paid Observations. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):280-287 Available from https://proceedings.mlr.press/v32/seldin14.html.

Prediction with Limited Advice and Multiarmed Bandits with Paid Observations

Abstract

Cite this Paper

Related Material