Multi-armed Bandit Problems with History

Pannagadatta Shivaswamy; Thorsten Joachims

Multi-armed Bandit Problems with History

Pannagadatta Shivaswamy, Thorsten Joachims

Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:1046-1054, 2012.

Abstract

In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts. We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data can reduce regret from logarithmic to constant. The effectiveness of the proposed algorithms are demonstrated on a large-scale malicious URL detection problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v22-shivaswamy12,
  title = 	 {Multi-armed Bandit Problems with History},
  author = 	 {Shivaswamy, Pannagadatta and Joachims, Thorsten},
  booktitle = 	 {Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1046--1054},
  year = 	 {2012},
  editor = 	 {Lawrence, Neil D. and Girolami, Mark},
  volume = 	 {22},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {La Palma, Canary Islands},
  month = 	 {21--23 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v22/shivaswamy12/shivaswamy12.pdf},
  url = 	 {https://proceedings.mlr.press/v22/shivaswamy12.html},
  abstract = 	 {In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts.  We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data  can reduce  regret from logarithmic to constant. The effectiveness of the proposed algorithms  are demonstrated on a large-scale malicious URL detection problem.}
}

Endnote

%0 Conference Paper
%T Multi-armed Bandit Problems with History
%A Pannagadatta Shivaswamy
%A Thorsten Joachims
%B Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2012
%E Neil D. Lawrence
%E Mark Girolami	
%F pmlr-v22-shivaswamy12
%I PMLR
%P 1046--1054
%U https://proceedings.mlr.press/v22/shivaswamy12.html
%V 22
%X In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts.  We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data  can reduce  regret from logarithmic to constant. The effectiveness of the proposed algorithms  are demonstrated on a large-scale malicious URL detection problem.

RIS


TY  - CPAPER
TI  - Multi-armed Bandit Problems with History
AU  - Pannagadatta Shivaswamy
AU  - Thorsten Joachims
BT  - Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics
DA  - 2012/03/21
ED  - Neil D. Lawrence
ED  - Mark Girolami	
ID  - pmlr-v22-shivaswamy12
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 22
SP  - 1046
EP  - 1054
L1  - http://proceedings.mlr.press/v22/shivaswamy12/shivaswamy12.pdf
UR  - https://proceedings.mlr.press/v22/shivaswamy12.html
AB  - In this paper we consider the stochastic multi-armed bandit problem. However, unlike in the conventional version of this problem, we do not assume that the algorithm starts from scratch. Many applications offer observations of (some of) the arms even before the algorithm starts.  We propose three novel multi-armed bandit algorithms that can exploit this data. An upper bound on the regret is derived in each case. The results show that a logarithmic amount of historic data  can reduce  regret from logarithmic to constant. The effectiveness of the proposed algorithms  are demonstrated on a large-scale malicious URL detection problem.
ER  -

APA


Shivaswamy, P. & Joachims, T.. (2012). Multi-armed Bandit Problems with History. Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 22:1046-1054 Available from https://proceedings.mlr.press/v22/shivaswamy12.html.

Multi-armed Bandit Problems with History

Abstract

Cite this Paper

Related Material