Bandits for BMO Functions

Tianyu Wang, Cynthia Rudin
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:9996-10006, 2020.

Abstract

We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with singularities in the domain. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $\delta$-regret – a regret measured against an arm that is optimal after removing a $\delta$-sized portion of the arm space.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-wang20q, title = {Bandits for {BMO} Functions}, author = {Wang, Tianyu and Rudin, Cynthia}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {9996--10006}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/wang20q/wang20q.pdf}, url = {https://proceedings.mlr.press/v119/wang20q.html}, abstract = {We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with singularities in the domain. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $\delta$-regret – a regret measured against an arm that is optimal after removing a $\delta$-sized portion of the arm space.} }
Endnote
%0 Conference Paper %T Bandits for BMO Functions %A Tianyu Wang %A Cynthia Rudin %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-wang20q %I PMLR %P 9996--10006 %U https://proceedings.mlr.press/v119/wang20q.html %V 119 %X We study the bandit problem where the underlying expected reward is a Bounded Mean Oscillation (BMO) function. BMO functions are allowed to be discontinuous and unbounded, and are useful in modeling signals with singularities in the domain. We develop a toolset for BMO bandits, and provide an algorithm that can achieve poly-log $\delta$-regret – a regret measured against an arm that is optimal after removing a $\delta$-sized portion of the arm space.
APA
Wang, T. & Rudin, C.. (2020). Bandits for BMO Functions. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:9996-10006 Available from https://proceedings.mlr.press/v119/wang20q.html.

Related Material