Best Arm Identification with Safety Constraints

Zhenlin Wang, Andrew J. Wagenmaker, Kevin Jamieson
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:9114-9146, 2022.

Abstract

The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-wang22h, title = { Best Arm Identification with Safety Constraints }, author = {Wang, Zhenlin and Wagenmaker, Andrew J. and Jamieson, Kevin}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {9114--9146}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/wang22h/wang22h.pdf}, url = {https://proceedings.mlr.press/v151/wang22h.html}, abstract = { The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness. } }
Endnote
%0 Conference Paper %T Best Arm Identification with Safety Constraints %A Zhenlin Wang %A Andrew J. Wagenmaker %A Kevin Jamieson %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-wang22h %I PMLR %P 9114--9146 %U https://proceedings.mlr.press/v151/wang22h.html %V 151 %X The best arm identification problem in the multi-armed bandit setting is an excellent model of many real-world decision-making problems, yet it fails to capture the fact that in the real-world, safety constraints often must be met while learning. In this work we study the question of best-arm identification in safety-critical settings, where the goal of the agent is to find the best safe option out of many, while exploring in a way that guarantees certain, initially unknown safety constraints are met. We first analyze this problem in the setting where the reward and safety constraint takes a linear structure, and show nearly matching upper and lower bounds. We then analyze a much more general version of the problem where we only assume the reward and safety constraint can be modeled by monotonic functions, and propose an algorithm in this setting which is guaranteed to learn safely. We conclude with experimental results demonstrating the effectiveness of our approaches in scenarios such as safely identifying the best drug out of many in order to treat an illness.
APA
Wang, Z., Wagenmaker, A.J. & Jamieson, K.. (2022). Best Arm Identification with Safety Constraints . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:9114-9146 Available from https://proceedings.mlr.press/v151/wang22h.html.

Related Material