Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Ping-Chun Hsieh; Xi Liu; Anirban Bhattacharya; P R Kumar

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P R Kumar

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2800-2809, 2019.

Abstract

Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.

Cite this Paper

BibTeX

@InProceedings{pmlr-v97-hsieh19a,
  title = 	 {Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging},
  author =       {Hsieh, Ping-Chun and Liu, Xi and Bhattacharya, Anirban and Kumar, P R},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2800--2809},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/hsieh19a/hsieh19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/hsieh19a.html},
  abstract = 	 {Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.}
}

Endnote

%0 Conference Paper
%T Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
%A Ping-Chun Hsieh
%A Xi Liu
%A Anirban Bhattacharya
%A P R Kumar
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-hsieh19a
%I PMLR
%P 2800--2809
%U https://proceedings.mlr.press/v97/hsieh19a.html
%V 97
%X Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a “reneging” phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct “satisfaction level," with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves $\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$ regret. Finally, we validate the performance of HR-UCB via simulations.

APA

Hsieh, P., Liu, X., Bhattacharya, A. & Kumar, P.R.. (2019). Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2800-2809 Available from https://proceedings.mlr.press/v97/hsieh19a.html.

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging

Abstract

Cite this Paper

Related Material