- title: 'Preface' abstract: 'Preface to the Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2 July 2, 2011, Bellevue, Washington, USA.' volume: 26 URL: https://proceedings.mlr.press/v26/glowacka12a.html PDF: http://proceedings.mlr.press/v26/glowacka12a/glowacka12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-glowacka12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Dorota family: Głowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: i-i id: glowacka12a issued: date-parts: - 2012 - 5 - 2 firstpage: i lastpage: i published: 2012-05-02 00:00:00 +0000 - title: 'Online Clustering with Experts' abstract: 'We propose an online clustering algorithm that manages the exploration/exploitation tradeoff using an adaptive weighting over batch clustering algorithms. We extend algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithm computes an approximation to the current value of the \emphk-means objective obtained by each expert. When the experts are batch clustering algorithms with \emphb-approximation guarantees with respect to the \emphk-means objective (for example, the \emphk-means++ or \emphk-means # algorithms), applied to a sliding window of the data stream, our algorithm achieves an approximation guarantee with respect to the \emphk-means objective. The form of this online clustering approximation guarantee is novel, and extends an evaluation framework proposed by Dasgupta as an analog to regret. Our algorithm tracks the best clustering algorithm on real and simulated data sets.' volume: 26 URL: https://proceedings.mlr.press/v26/choromanska12a.html PDF: http://proceedings.mlr.press/v26/choromanska12a/choromanska12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-choromanska12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Anna family: Choromanska - given: Claire family: Monteleoni editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: 1-18 id: choromanska12a issued: date-parts: - 2012 - 5 - 2 firstpage: 1 lastpage: 18 published: 2012-05-02 00:00:00 +0000 - title: 'An Unbiased Offline Evaluation of Contextual Bandit Algorithms with Generalized Linear Models' abstract: 'Contextual bandit algorithms have become popular tools in online recommendation and advertising systems. \emphOffline evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their “partial-label” nature. A common practice is to create a simulator which simulates the online environment for the problem at hand and then run an algorithm against this simulator. However, creating the simulator itself is often difficult and modeling bias is usually unavoidably introduced. The purpose of this paper is two-fold. First, we review a recently proposed \emphoffline evaluation technique. Different from simulator-based approaches, the method is completely data-driven, is easy to adapt to different applications, and more importantly, provides provably unbiased evaluations. We argue for the wide use of this technique as standard practice when comparing bandit algorithms in real-life problems. Second, as an application of this technique, we compare and validate a number of new algorithms based on \emphgeneralized linear models. Experiments using real Yahoo! data suggest substantial improvement over algorithms with linear models when the rewards are binary.' volume: 26 URL: https://proceedings.mlr.press/v26/li12a.html PDF: http://proceedings.mlr.press/v26/li12a/li12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-li12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Lihong family: Li - given: Wei family: Chu - given: John family: Langford - given: Taesup family: Moon - given: Xuanhui family: Wang editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: 19-36 id: li12a issued: date-parts: - 2012 - 5 - 2 firstpage: 19 lastpage: 36 published: 2012-05-02 00:00:00 +0000 - title: 'Exploration and Exploitation with Insufficient Resources' abstract: 'In physical experimentation, the resources available to discover new knowledge are typically extremely small in comparison to the size and dimensionality of the parameter spaces that can be searched. Additionally, due to the nature of physical experimentation, experimental errors will occur, particularly in biochemical experimentation where the reactants may undetectably denature, or reactant contamination could occur or equipment failure. These errors mean that not all experimental measurements and observations will be accurate or representative of the system being investigated. As the validity of observations is not guaranteed, resources must be split between exploration to discover new knowledge and exploitation to test the validity of the new knowledge. Currently we are investigating the automation of discovery in physical experimentation, with the aim of producing a fully autonomous closed-loop robotic machine capable of autonomous experimentation. This machine will build and evaluate hypotheses, determine experiments to perform and then perform them on an automated lab-on-chip experimentation platform for biochemical response characterisation. In the present work we examine how the trade-off between exploration and exploitation can occur in a situation where the number of experiments that can be performed is extremely small and where the observations returned are sometimes erroneous or unrepresentative of the behaviour being examined. To manage this trade-off we consider the use of a Bayesian notion of surprise, which is used to perform exploration experiments whilst observations are unsurprising from the predictions that can be made and exploits when observations are surprising as they do not match the predicted response.' volume: 26 URL: https://proceedings.mlr.press/v26/lovell12a.html PDF: http://proceedings.mlr.press/v26/lovell12a/lovell12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-lovell12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Chris family: Lovell - given: Gareth family: Jones - given: Klaus-Peter family: Zauner - given: Steve R. family: Gunn editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: 37-61 id: lovell12a issued: date-parts: - 2012 - 5 - 2 firstpage: 37 lastpage: 61 published: 2012-05-02 00:00:00 +0000 - title: 'ICML Exploration & Exploitation Challenge: Keep it simple!' abstract: 'Recommendation has become a key feature in the economy of a lot of companies (online shopping, search engines...). There is a lot of work going on regarding recommender systems and there is still a lot to do to improve them. Indeed nowadays in many companies most of the job is done by hand. Moreover even when a supposedly smart recommender system is designed, it is hard to evaluate it without using real audience which obviously involves economic issues. The ICML Exploration & Exploitation challenge is an attempt to make people propose efficient recommendation techniques and particularly focuses on limited computational resources. The challenge also proposes a framework to address the problem of evaluating a recommendation algorithm with real data. We took part in this challenge and achieved the best performances; this paper aims at reporting on this achievement; we also discuss the evaluation process and propose a better one for future challenges of the same kind.' volume: 26 URL: https://proceedings.mlr.press/v26/nicol12a.html PDF: http://proceedings.mlr.press/v26/nicol12a/nicol12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-nicol12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Olivier family: Nicol - given: Jérémie family: Mary - given: Philippe family: Preux editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: 62-85 id: nicol12a issued: date-parts: - 2012 - 5 - 2 firstpage: 62 lastpage: 85 published: 2012-05-02 00:00:00 +0000 - title: 'Stumping along a Summary for Exploration & Exploitation Challenge 2011' abstract: 'The \emphPascal Exploration & Exploitation challenge 2011 seeks to evaluate algorithms for the online website content selection problem. This article presents the solution we used to achieve second place in this challenge and some side-experiments we performed. The methods we evaluated are all structured in three layers. The first layer provides an online summary of the data stream for continuous and nominal data. Continuous data are handled using an online quantile summary. Nominal data are summarized with a hash-based counting structure. With these techniques, we managed to build an accurate stream summary with a small memory footprint. The second layer uses the summary to build predictors. We exploited several kinds of trees from simple decision stumps to deep multivariate ones. For the last layer, we explored several combination strategies: online bagging, exponential weighting, linear ranker, and simple averaging.' volume: 26 URL: https://proceedings.mlr.press/v26/salperwyck12a.html PDF: http://proceedings.mlr.press/v26/salperwyck12a/salperwyck12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-salperwyck12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Christophe family: Salperwyck - given: Tanguy family: Urvoy editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: 86-97 id: salperwyck12a issued: date-parts: - 2012 - 5 - 2 firstpage: 86 lastpage: 97 published: 2012-05-02 00:00:00 +0000 - title: 'PAC-Bayes-Bernstein Inequality for Martingales and its Application to Multiarmed Bandits' abstract: 'We develop a new tool for data-dependent analysis of the exploration-exploitation trade-off in learning under limited feedback. Our tool is based on two main ingredients. The first ingredient is a new concentration inequality that makes it possible to control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. The second ingredient is an application of this inequality to the exploration-exploitation trade-off via importance weighted sampling. We apply the new tool to the stochastic multiarmed bandit problem, however, the main importance of this paper is the development and understanding of the new tool rather than improvement of existing algorithms for stochastic multiarmed bandits. In the follow-up work we demonstrate that the new tool can improve over state-of-the-art in structurally richer problems, such as stochastic multiarmed bandits with side information.' volume: 26 URL: https://proceedings.mlr.press/v26/seldin12a.html PDF: http://proceedings.mlr.press/v26/seldin12a/seldin12a.pdf edit: https://github.com/mlresearch//v26/edit/gh-pages/_posts/2012-05-02-seldin12a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2' publisher: 'PMLR' author: - given: Yevgeny family: Seldin - given: Nicolò family: Cesa-Bianchi - given: Peter family: Auer - given: François family: Laviolette - given: John family: Shawe-Taylor editor: - given: Dorota family: Glowacka - given: Louis family: Dorard - given: John family: Shawe-Taylor address: Bellevue, Washington, USA page: 98-111 id: seldin12a issued: date-parts: - 2012 - 5 - 2 firstpage: 98 lastpage: 111 published: 2012-05-02 00:00:00 +0000