Multi-armed Bandit Problems with History

Pannagadatta Shivaswamy, Thorsten Joachims
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:1046-1054, 2012.

Abstract

In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v22-shivaswamy12, title = {Multi-armed Bandit Problems with History}, author = {Shivaswamy, Pannagadatta and Joachims, Thorsten}, booktitle = {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics}, pages = {1046--1054}, year = {2012}, editor = {Lawrence, Neil D. and Girolami, Mark}, volume = {22}, series = {Proceedings of Machine Learning Research}, address = {La Palma, Canary Islands}, month = {21--23 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v22/shivaswamy12/shivaswamy12.pdf}, url = {https://proceedings.mlr.press/v22/shivaswamy12.html}, abstract = {In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.} }
Endnote
%0 Conference Paper %T Multi-armed Bandit Problems with History %A Pannagadatta Shivaswamy %A Thorsten Joachims %B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2012 %E Neil D. Lawrence %E Mark Girolami %F pmlr-v22-shivaswamy12 %I PMLR %P 1046--1054 %U https://proceedings.mlr.press/v22/shivaswamy12.html %V 22 %X In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.
RIS
TY - CPAPER TI - Multi-armed Bandit Problems with History AU - Pannagadatta Shivaswamy AU - Thorsten Joachims BT - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics DA - 2012/03/21 ED - Neil D. Lawrence ED - Mark Girolami ID - pmlr-v22-shivaswamy12 PB - PMLR DP - Proceedings of Machine Learning Research VL - 22 SP - 1046 EP - 1054 L1 - http://proceedings.mlr.press/v22/shivaswamy12/shivaswamy12.pdf UR - https://proceedings.mlr.press/v22/shivaswamy12.html AB - In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem. ER -
APA
Shivaswamy, P. & Joachims, T.. (2012). Multi-armed Bandit Problems with History. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:1046-1054 Available from https://proceedings.mlr.press/v22/shivaswamy12.html.

Related Material