Meta-learning with Stochastic Linear Bandits

Leonardo Cella; Alessandro Lazaric; Massimiliano Pontil

Meta-learning with Stochastic Linear Bandits

Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1360-1370, 2020.

Abstract

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Cite this Paper

BibTeX


@InProceedings{pmlr-v119-cella20a,
  title = 	 {Meta-learning with Stochastic Linear Bandits},
  author =       {Cella, Leonardo and Lazaric, Alessandro and Pontil, Massimiliano},
  booktitle = 	 {Proceedings of the 37th International Conference on Machine Learning},
  pages = 	 {1360--1370},
  year = 	 {2020},
  editor = 	 {III, Hal Daumé and Singh, Aarti},
  volume = 	 {119},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--18 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v119/cella20a/cella20a.pdf},
  url = 	 {https://proceedings.mlr.press/v119/cella20a.html},
  abstract = 	 {We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.}
}

Endnote

%0 Conference Paper
%T Meta-learning with Stochastic Linear Bandits
%A Leonardo Cella
%A Alessandro Lazaric
%A Massimiliano Pontil
%B Proceedings of the 37th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2020
%E Hal Daumé III
%E Aarti Singh	
%F pmlr-v119-cella20a
%I PMLR
%P 1360--1370
%U https://proceedings.mlr.press/v119/cella20a.html
%V 119
%X We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

APA


Cella, L., Lazaric, A. & Pontil, M.. (2020). Meta-learning with Stochastic Linear Bandits. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:1360-1370 Available from https://proceedings.mlr.press/v119/cella20a.html.

Meta-learning with Stochastic Linear Bandits

Abstract

Cite this Paper

Related Material