Online Stochastic Optimization  under Correlated Bandit Feedback

Mohammad Gheshlaghi azar; Alessandro Lazaric; Emma Brunskill

Online Stochastic Optimization under Correlated Bandit Feedback

Mohammad Gheshlaghi azar, Alessandro Lazaric, Emma Brunskill

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):1557-1565, 2014.

Abstract

In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime \mathcal X-armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. The main advantage of HCT is that it handles the challenging case of correlated bandit feedback (reward), whereas existing methods require rewards to be conditionally independent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-azar14,
  title = 	 {Online Stochastic Optimization  under Correlated Bandit Feedback},
  author = 	 {azar, Mohammad Gheshlaghi and Lazaric, Alessandro and Brunskill, Emma},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {1557--1565},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/azar14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/azar14.html},
  abstract = 	 {In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime \mathcal X-armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. The main advantage of HCT is that it handles the challenging case of correlated bandit feedback (reward), whereas existing methods require rewards to be conditionally independent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.}
}

Endnote

%0 Conference Paper
%T Online Stochastic Optimization  under Correlated Bandit Feedback
%A Mohammad Gheshlaghi azar
%A Alessandro Lazaric
%A Emma Brunskill
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-azar14
%I PMLR
%P 1557--1565
%U https://proceedings.mlr.press/v32/azar14.html
%V 32
%N 2
%X In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime \mathcal X-armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. The main advantage of HCT is that it handles the challenging case of correlated bandit feedback (reward), whereas existing methods require rewards to be conditionally independent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.

RIS


TY  - CPAPER
TI  - Online Stochastic Optimization  under Correlated Bandit Feedback
AU  - Mohammad Gheshlaghi azar
AU  - Alessandro Lazaric
AU  - Emma Brunskill
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-azar14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 1557
EP  - 1565
L1  - http://proceedings.mlr.press/v32/azar14.pdf
UR  - https://proceedings.mlr.press/v32/azar14.html
AB  - In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel anytime \mathcal X-armed bandit algorithm, and derive regret bounds matching the performance of state-of-the-art algorithms in terms of the dependency on number of steps and the near-optimality dimension. The main advantage of HCT is that it handles the challenging case of correlated bandit feedback (reward), whereas existing methods require rewards to be conditionally independent. HCT also improves on the state-of-the-art in terms of the memory requirement, as well as requiring a weaker smoothness assumption on the mean-reward function in comparison with the existing anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in reinforcement learning and we report preliminary empirical results.
ER  -

APA


azar, M.G., Lazaric, A. & Brunskill, E.. (2014). Online Stochastic Optimization  under Correlated Bandit Feedback. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):1557-1565 Available from https://proceedings.mlr.press/v32/azar14.html.

Online Stochastic Optimization under Correlated Bandit Feedback

Abstract

Cite this Paper

Related Material