An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models

Lihong Li; Wei Chu; John Langford; Taesup Moon; Xuanhui Wang

An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models

Lihong Li, Wei Chu, John Langford, Taesup Moon, Xuanhui Wang

Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, PMLR 26:19-36, 2012.

Abstract

Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. \emphOffline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partial-label” nature. A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is two-fold. First, we review a recently proposed \emphoffline evaluation technique. Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on \emphgeneralized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.

Cite this Paper

BibTeX


@InProceedings{pmlr-v26-li12a,
  title = 	 {An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models},
  author = 	 {Li, Lihong and Chu, Wei and Langford, John and Moon, Taesup and Wang, Xuanhui},
  booktitle = 	 {Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2},
  pages = 	 {19--36},
  year = 	 {2012},
  editor = 	 {Glowacka, Dorota and Dorard, Louis and Shawe-Taylor, John},
  volume = 	 {26},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bellevue, Washington, USA},
  month = 	 {02 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v26/li12a/li12a.pdf},
  url = 	 {https://proceedings.mlr.press/v26/li12a.html},
  abstract = 	 {Contextual bandit algorithms have become popular tools in online   recommendation and advertising systems. \emphOffline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partial-label” nature.  A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is two-fold.  First, we review a recently proposed \emphoffline evaluation technique.  Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations.  We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on \emphgeneralized linear models.  Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.}
}

Endnote

%0 Conference Paper
%T An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models
%A Lihong Li
%A Wei Chu
%A John Langford
%A Taesup Moon
%A Xuanhui Wang
%B Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2
%C Proceedings of Machine Learning Research
%D 2012
%E Dorota Glowacka
%E Louis Dorard
%E John Shawe-Taylor	
%F pmlr-v26-li12a
%I PMLR
%P 19--36
%U https://proceedings.mlr.press/v26/li12a.html
%V 26
%X Contextual bandit algorithms have become popular tools in online   recommendation and advertising systems. \emphOffline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partial-label” nature.  A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is two-fold.  First, we review a recently proposed \emphoffline evaluation technique.  Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations.  We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on \emphgeneralized linear models.  Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.

RIS


TY  - CPAPER
TI  - An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models
AU  - Lihong Li
AU  - Wei Chu
AU  - John Langford
AU  - Taesup Moon
AU  - Xuanhui Wang
BT  - Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2
DA  - 2012/05/02
ED  - Dorota Glowacka
ED  - Louis Dorard
ED  - John Shawe-Taylor	
ID  - pmlr-v26-li12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 26
SP  - 19
EP  - 36
L1  - http://proceedings.mlr.press/v26/li12a/li12a.pdf
UR  - https://proceedings.mlr.press/v26/li12a.html
AB  - Contextual bandit algorithms have become popular tools in online   recommendation and advertising systems. \emphOffline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partial-label” nature.  A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is two-fold.  First, we review a recently proposed \emphoffline evaluation technique.  Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations.  We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on \emphgeneralized linear models.  Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.
ER  -

APA


Li, L., Chu, W., Langford, J., Moon, T. & Wang, X.. (2012). An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models. Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, in Proceedings of Machine Learning Research 26:19-36 Available from https://proceedings.mlr.press/v26/li12a.html.

Related Material

Download PDF