Prediction with Limited Advice and Multiarmed Bandits with Paid Observations

Yevgeny Seldin, Peter Bartlett, Koby Crammer, Yasin Abbasi-Yadkori
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(1):280-287, 2014.

Abstract

We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-seldin14, title = {Prediction with Limited Advice and Multiarmed Bandits with Paid Observations}, author = {Seldin, Yevgeny and Bartlett, Peter and Crammer, Koby and Abbasi-Yadkori, Yasin}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {280--287}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {1}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/seldin14.pdf}, url = {https://proceedings.mlr.press/v32/seldin14.html}, abstract = {We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.} }
Endnote
%0 Conference Paper %T Prediction with Limited Advice and Multiarmed Bandits with Paid Observations %A Yevgeny Seldin %A Peter Bartlett %A Koby Crammer %A Yasin Abbasi-Yadkori %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-seldin14 %I PMLR %P 280--287 %U https://proceedings.mlr.press/v32/seldin14.html %V 32 %N 1 %X We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved.
RIS
TY - CPAPER TI - Prediction with Limited Advice and Multiarmed Bandits with Paid Observations AU - Yevgeny Seldin AU - Peter Bartlett AU - Koby Crammer AU - Yasin Abbasi-Yadkori BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/01/27 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-seldin14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 1 SP - 280 EP - 287 L1 - http://proceedings.mlr.press/v32/seldin14.pdf UR - https://proceedings.mlr.press/v32/seldin14.html AB - We study two problems of online learning under restricted information access. In the first problem, \emphprediction with limited advice, we consider a game of prediction with expert advice, where on each round of the game we query the advice of a subset of M out of N experts. We present an algorithm that achieves O(\sqrt(N/M)T\ln N) regret on T rounds of this game. The second problem, the \emphmultiarmed bandit with paid observations, is a variant of the adversarial N-armed bandit game, where on round t of the game we can observe the reward of any number of arms, but each observation has a cost c. We present an algorithm that achieves O((cN\ln N)^1/3 T^2/3 + \sqrtT \ln N) regret on T rounds of this game in the worst case. Furthermore, we present a number of refinements that treat arm- and time-dependent observation costs and achieve lower regret under benign conditions. We present lower bounds that show that, apart from the logarithmic factors, the worst-case regret bounds cannot be improved. ER -
APA
Seldin, Y., Bartlett, P., Crammer, K. & Abbasi-Yadkori, Y.. (2014). Prediction with Limited Advice and Multiarmed Bandits with Paid Observations. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(1):280-287 Available from https://proceedings.mlr.press/v32/seldin14.html.

Related Material