Identifying near-optimal decisions in linear-in-parameter  bandit models with continuous decision sets

Sanjay P. Bhat; Chaitanya Amballa

Identifying near-optimal decisions in linear-in-parameter bandit models with continuous decision sets

Sanjay P. Bhat, Chaitanya Amballa

Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:181-190, 2022.

Abstract

We consider an online optimization problem in a bandit setting in which a learner chooses decisions from a continuous decision set at discrete decision epochs, and receives noisy rewards from the environment in response. While the noise samples are assumed to be independent and sub-Gaussian, the mean reward at each epoch is a fixed but unknown linear function of a feature vector, which depends on the decision through a known (and possibly nonlinear) feature map. We study the problem within the framework of best-arm identification with fixed confidence, and provide a template algorithm for approximately learning the optimal decision in a probably approximately correct (PAC) setting. More precisely, the template algorithm samples the decision space till a stopping condition is met, and returns a subset of decisions such that, with the required confidence, every element of the subset is approximately optimal for the unknown mean reward function. We provide a sample complexity bound for the template algorithm and then specialize it to the case where the mean-reward function is a univariate polynomial of a single decision variable. We provide an implementable algorithm for this case by explicitly instantiating all the steps in the template algorithm. Finally, we provide experimental results to demonstrate the efficacy of our algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v180-bhat22a,
  title = 	 {Identifying near-optimal decisions in linear-in-parameter  bandit models with continuous decision sets},
  author =       {Bhat, Sanjay P. and Amballa, Chaitanya},
  booktitle = 	 {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence},
  pages = 	 {181--190},
  year = 	 {2022},
  editor = 	 {Cussens, James and Zhang, Kun},
  volume = 	 {180},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {01--05 Aug},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v180/bhat22a/bhat22a.pdf},
  url = 	 {https://proceedings.mlr.press/v180/bhat22a.html},
  abstract = 	 {We consider an online optimization problem in a bandit  setting in which a learner chooses decisions from a continuous decision  set at discrete decision epochs, and receives noisy rewards from the  environment in response. While the noise samples are assumed to be  independent and sub-Gaussian, the mean reward at each epoch is a fixed but  unknown linear function of a feature vector, which depends on the decision  through a known (and possibly nonlinear)  feature map. We study the  problem within the framework of best-arm identification with fixed  confidence, and provide a template algorithm for approximately learning  the optimal decision in a probably approximately correct (PAC) setting.  More precisely, the template algorithm samples the decision space till a  stopping condition is met,  and returns a subset of decisions such that,  with the required confidence, every element of the subset is approximately  optimal for the unknown mean reward function.  We provide a sample  complexity bound for the template algorithm and then specialize it to the  case where the mean-reward function is a univariate polynomial of a single  decision variable. We provide an implementable algorithm for this case by  explicitly instantiating all the steps in the template algorithm. Finally,  we provide experimental results to demonstrate the efficacy of our  algorithms.}
}

Endnote

%0 Conference Paper
%T Identifying near-optimal decisions in linear-in-parameter  bandit models with continuous decision sets
%A Sanjay P. Bhat
%A Chaitanya Amballa
%B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
%C Proceedings of Machine Learning Research
%D 2022
%E James Cussens
%E Kun Zhang	
%F pmlr-v180-bhat22a
%I PMLR
%P 181--190
%U https://proceedings.mlr.press/v180/bhat22a.html
%V 180
%X We consider an online optimization problem in a bandit  setting in which a learner chooses decisions from a continuous decision  set at discrete decision epochs, and receives noisy rewards from the  environment in response. While the noise samples are assumed to be  independent and sub-Gaussian, the mean reward at each epoch is a fixed but  unknown linear function of a feature vector, which depends on the decision  through a known (and possibly nonlinear)  feature map. We study the  problem within the framework of best-arm identification with fixed  confidence, and provide a template algorithm for approximately learning  the optimal decision in a probably approximately correct (PAC) setting.  More precisely, the template algorithm samples the decision space till a  stopping condition is met,  and returns a subset of decisions such that,  with the required confidence, every element of the subset is approximately  optimal for the unknown mean reward function.  We provide a sample  complexity bound for the template algorithm and then specialize it to the  case where the mean-reward function is a univariate polynomial of a single  decision variable. We provide an implementable algorithm for this case by  explicitly instantiating all the steps in the template algorithm. Finally,  we provide experimental results to demonstrate the efficacy of our  algorithms.

APA


Bhat, S.P. & Amballa, C.. (2022). Identifying near-optimal decisions in linear-in-parameter  bandit models with continuous decision sets. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:181-190 Available from https://proceedings.mlr.press/v180/bhat22a.html.

Identifying near-optimal decisions in linear-in-parameter bandit models with continuous decision sets

Abstract

Cite this Paper

Related Material