Best arm identification in multi-armed bandits with delayed feedback

Aditya Grover, Todor Markov, Peter Attia, Norman Jin, Nicolas Perkins, Bryan Cheong, Michael Chen, Zi Yang, Stephen Harris, William Chueh, Stefano Ermon
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:833-842, 2018.

Abstract

In this paper, we propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedbacks. The delay in feedbacks increases the effective sample complexity of the algorithm, but can be offset by partial feedbacks received before a pull is completed. We propose a a general modeling framework to structure in the partial feedbacks, and as a special case we introduce efficient algorithms for best arm identification in settings where the partial feedbacks are biased or unbiased estimators of the final outcome of the pull. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Experiments on simulated as well as real world datasets of policy search for charging chemical batteries and hyperparameter optimization for mixed integer programming demonstrate that exploiting the structure of partial and delayed feedbacks can lead to significant improvements over baselines on both sequential and parallel MAB.

Cite this Paper


BibTeX
@InProceedings{pmlr-v84-grover18b, title = {Best arm identification in multi-armed bandits with delayed feedback}, author = {Grover, Aditya and Markov, Todor and Attia, Peter and Jin, Norman and Perkins, Nicolas and Cheong, Bryan and Chen, Michael and Yang, Zi and Harris, Stephen and Chueh, William and Ermon, Stefano}, booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics}, pages = {833--842}, year = {2018}, editor = {Storkey, Amos and Perez-Cruz, Fernando}, volume = {84}, series = {Proceedings of Machine Learning Research}, month = {09--11 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v84/grover18b/grover18b.pdf}, url = {https://proceedings.mlr.press/v84/grover18b.html}, abstract = {In this paper, we propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedbacks. The delay in feedbacks increases the effective sample complexity of the algorithm, but can be offset by partial feedbacks received before a pull is completed. We propose a a general modeling framework to structure in the partial feedbacks, and as a special case we introduce efficient algorithms for best arm identification in settings where the partial feedbacks are biased or unbiased estimators of the final outcome of the pull. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Experiments on simulated as well as real world datasets of policy search for charging chemical batteries and hyperparameter optimization for mixed integer programming demonstrate that exploiting the structure of partial and delayed feedbacks can lead to significant improvements over baselines on both sequential and parallel MAB.} }
Endnote
%0 Conference Paper %T Best arm identification in multi-armed bandits with delayed feedback %A Aditya Grover %A Todor Markov %A Peter Attia %A Norman Jin %A Nicolas Perkins %A Bryan Cheong %A Michael Chen %A Zi Yang %A Stephen Harris %A William Chueh %A Stefano Ermon %B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2018 %E Amos Storkey %E Fernando Perez-Cruz %F pmlr-v84-grover18b %I PMLR %P 833--842 %U https://proceedings.mlr.press/v84/grover18b.html %V 84 %X In this paper, we propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedbacks. The delay in feedbacks increases the effective sample complexity of the algorithm, but can be offset by partial feedbacks received before a pull is completed. We propose a a general modeling framework to structure in the partial feedbacks, and as a special case we introduce efficient algorithms for best arm identification in settings where the partial feedbacks are biased or unbiased estimators of the final outcome of the pull. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Experiments on simulated as well as real world datasets of policy search for charging chemical batteries and hyperparameter optimization for mixed integer programming demonstrate that exploiting the structure of partial and delayed feedbacks can lead to significant improvements over baselines on both sequential and parallel MAB.
APA
Grover, A., Markov, T., Attia, P., Jin, N., Perkins, N., Cheong, B., Chen, M., Yang, Z., Harris, S., Chueh, W. & Ermon, S.. (2018). Best arm identification in multi-armed bandits with delayed feedback. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 84:833-842 Available from https://proceedings.mlr.press/v84/grover18b.html.

Related Material