Information-Directed Selection for Top-Two Algorithms

Wei You, Chao Qin, Zihao Wang, Shuoguang Yang
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:2850-2851, 2023.

Abstract

We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary and sufficient conditions for the optimal allocation using dual variables. Remarkably these optimality conditions lead to the extension of top-two algorithm design principle (Russo, 2020), initially proposed for best-arm identification. Furthermore, our optimality conditions induce a simple and effective selection rule dubbed information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain. As a theoretical guarantee, we prove that integrated with IDS, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature (Russo, 2020). As a by-product, we show that for $k > 1$, top-two algorithms cannot achieve optimality even with an oracle tuning parameter. Numerical experiments show the superior performance of the proposed top-two algorithms with IDS and considerable improvement compared with algorithms without adaptive selection.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-you23a, title = {Information-Directed Selection for Top-Two Algorithms}, author = {You, Wei and Qin, Chao and Wang, Zihao and Yang, Shuoguang}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {2850--2851}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/you23a/you23a.pdf}, url = {https://proceedings.mlr.press/v195/you23a.html}, abstract = {We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary and sufficient conditions for the optimal allocation using dual variables. Remarkably these optimality conditions lead to the extension of top-two algorithm design principle (Russo, 2020), initially proposed for best-arm identification. Furthermore, our optimality conditions induce a simple and effective selection rule dubbed information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain. As a theoretical guarantee, we prove that integrated with IDS, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature (Russo, 2020). As a by-product, we show that for $k > 1$, top-two algorithms cannot achieve optimality even with an oracle tuning parameter. Numerical experiments show the superior performance of the proposed top-two algorithms with IDS and considerable improvement compared with algorithms without adaptive selection.} }
Endnote
%0 Conference Paper %T Information-Directed Selection for Top-Two Algorithms %A Wei You %A Chao Qin %A Zihao Wang %A Shuoguang Yang %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-you23a %I PMLR %P 2850--2851 %U https://proceedings.mlr.press/v195/you23a.html %V 195 %X We consider the best-k-arm identification problem for multi-armed bandits, where the objective is to select the exact set of k arms with the highest mean rewards by sequentially allocating measurement effort. We characterize the necessary and sufficient conditions for the optimal allocation using dual variables. Remarkably these optimality conditions lead to the extension of top-two algorithm design principle (Russo, 2020), initially proposed for best-arm identification. Furthermore, our optimality conditions induce a simple and effective selection rule dubbed information-directed selection (IDS) that selects one of the top-two candidates based on a measure of information gain. As a theoretical guarantee, we prove that integrated with IDS, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature (Russo, 2020). As a by-product, we show that for $k > 1$, top-two algorithms cannot achieve optimality even with an oracle tuning parameter. Numerical experiments show the superior performance of the proposed top-two algorithms with IDS and considerable improvement compared with algorithms without adaptive selection.
APA
You, W., Qin, C., Wang, Z. & Yang, S.. (2023). Information-Directed Selection for Top-Two Algorithms. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:2850-2851 Available from https://proceedings.mlr.press/v195/you23a.html.

Related Material