Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P R Kumar
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2800-2809, 2019.

Abstract

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-hsieh19a, title = {Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging}, author = {Hsieh, Ping-Chun and Liu, Xi and Bhattacharya, Anirban and Kumar, P R}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2800--2809}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/hsieh19a/hsieh19a.pdf}, url = {https://proceedings.mlr.press/v97/hsieh19a.html}, abstract = {Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.} }
Endnote
%0 Conference Paper %T Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging %A Ping-Chun Hsieh %A Xi Liu %A Anirban Bhattacharya %A P R Kumar %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-hsieh19a %I PMLR %P 2800--2809 %U https://proceedings.mlr.press/v97/hsieh19a.html %V 97 %X Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.
APA
Hsieh, P., Liu, X., Bhattacharya, A. & Kumar, P.R.. (2019). Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2800-2809 Available from https://proceedings.mlr.press/v97/hsieh19a.html.

Related Material