Reducing Exploration of Dying Arms in Mortal Bandits

Stefano Tracà, Cynthia Rudin, Weiyu Yan
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:156-163, 2020.

Abstract

Mortal bandits have proven to be extremely useful for providing news article recommendations, running automated online advertising campaigns, and for other applications where the set of available options changes over time. Previous work on this problem showed how to regulate exploration of new arms when they have recently appeared, but they do not adapt when the arms are about to disappear. Since in most applications we can determine either exactly or approximately when arms will disappear, we can leverage this information to improve performance: we should not be exploring arms that are about to disappear. We provide adaptations of algorithms, regret bounds, and experiments for this study, showing a clear benefit from regulating greed (exploration/exploitation) for arms that will soon disappear. We illustrate numerical performance on the Yahoo! Front Page Today Module User Click Log Dataset.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-traca20a, title = {Reducing Exploration of Dying Arms in Mortal Bandits}, author = {Trac{\`{a}}, Stefano and Rudin, Cynthia and Yan, Weiyu}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {156--163}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/traca20a/traca20a.pdf}, url = {https://proceedings.mlr.press/v115/traca20a.html}, abstract = {Mortal bandits have proven to be extremely useful for providing news article recommendations, running automated online advertising campaigns, and for other applications where the set of available options changes over time. Previous work on this problem showed how to regulate exploration of new arms when they have recently appeared, but they do not adapt when the arms are about to disappear. Since in most applications we can determine either exactly or approximately when arms will disappear, we can leverage this information to improve performance: we should not be exploring arms that are about to disappear. We provide adaptations of algorithms, regret bounds, and experiments for this study, showing a clear benefit from regulating greed (exploration/exploitation) for arms that will soon disappear. We illustrate numerical performance on the Yahoo! Front Page Today Module User Click Log Dataset.} }
Endnote
%0 Conference Paper %T Reducing Exploration of Dying Arms in Mortal Bandits %A Stefano Tracà %A Cynthia Rudin %A Weiyu Yan %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-traca20a %I PMLR %P 156--163 %U https://proceedings.mlr.press/v115/traca20a.html %V 115 %X Mortal bandits have proven to be extremely useful for providing news article recommendations, running automated online advertising campaigns, and for other applications where the set of available options changes over time. Previous work on this problem showed how to regulate exploration of new arms when they have recently appeared, but they do not adapt when the arms are about to disappear. Since in most applications we can determine either exactly or approximately when arms will disappear, we can leverage this information to improve performance: we should not be exploring arms that are about to disappear. We provide adaptations of algorithms, regret bounds, and experiments for this study, showing a clear benefit from regulating greed (exploration/exploitation) for arms that will soon disappear. We illustrate numerical performance on the Yahoo! Front Page Today Module User Click Log Dataset.
APA
Tracà, S., Rudin, C. & Yan, W.. (2020). Reducing Exploration of Dying Arms in Mortal Bandits. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:156-163 Available from https://proceedings.mlr.press/v115/traca20a.html.

Related Material