On the Finite-Time Performance of the Knowledge Gradient Algorithm

Yanwen Li; Siyang Gao

On the Finite-Time Performance of the Knowledge Gradient Algorithm

Yanwen Li, Siyang Gao

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12741-12764, 2022.

Abstract

The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive lower bounds and upper bounds for the probability of error and simple regret of the algorithm. With these bounds, existing asymptotic results become simple corollaries. We also show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to further demonstrate the finite-time behavior of the KG algorithm.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-li22f,
  title = 	 {On the Finite-Time Performance of the Knowledge Gradient Algorithm},
  author =       {Li, Yanwen and Gao, Siyang},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {12741--12764},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/li22f/li22f.pdf},
  url = 	 {https://proceedings.mlr.press/v162/li22f.html},
  abstract = 	 {The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive lower bounds and upper bounds for the probability of error and simple regret of the algorithm. With these bounds, existing asymptotic results become simple corollaries. We also show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to further demonstrate the finite-time behavior of the KG algorithm.}
}

Endnote

%0 Conference Paper
%T On the Finite-Time Performance of the Knowledge Gradient Algorithm
%A Yanwen Li
%A Siyang Gao
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-li22f
%I PMLR
%P 12741--12764
%U https://proceedings.mlr.press/v162/li22f.html
%V 162
%X The knowledge gradient (KG) algorithm is a popular and effective algorithm for the best arm identification (BAI) problem. Due to the complex calculation of KG, theoretical analysis of this algorithm is difficult, and existing results are mostly about the asymptotic performance of it, e.g., consistency, asymptotic sample allocation, etc. In this research, we present new theoretical results about the finite-time performance of the KG algorithm. Under independent and normally distributed rewards, we derive lower bounds and upper bounds for the probability of error and simple regret of the algorithm. With these bounds, existing asymptotic results become simple corollaries. We also show the performance of the algorithm for the multi-armed bandit (MAB) problem. These developments not only extend the existing analysis of the KG algorithm, but can also be used to analyze other improvement-based algorithms. Last, we use numerical experiments to further demonstrate the finite-time behavior of the KG algorithm.

APA


Li, Y. & Gao, S.. (2022). On the Finite-Time Performance of the Knowledge Gradient Algorithm. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:12741-12764 Available from https://proceedings.mlr.press/v162/li22f.html.

Related Material

Download PDF