HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

Junyu Cao, Ruijiang Gao, Esmaeil Keyvanshokooh
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:3016-3024, 2025.

Abstract

Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB (\textsf{RLinUCB}) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (\textsf{HR-Bandit}), which integrates human expertise to enhance performance. \textsf{HR-Bandit} offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-cao25a, title = {HR-Bandit: Human-AI Collaborated Linear Recourse Bandit}, author = {Cao, Junyu and Gao, Ruijiang and Keyvanshokooh, Esmaeil}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {3016--3024}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/cao25a/cao25a.pdf}, url = {https://proceedings.mlr.press/v258/cao25a.html}, abstract = {Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB (\textsf{RLinUCB}) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (\textsf{HR-Bandit}), which integrates human expertise to enhance performance. \textsf{HR-Bandit} offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.} }
Endnote
%0 Conference Paper %T HR-Bandit: Human-AI Collaborated Linear Recourse Bandit %A Junyu Cao %A Ruijiang Gao %A Esmaeil Keyvanshokooh %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-cao25a %I PMLR %P 3016--3024 %U https://proceedings.mlr.press/v258/cao25a.html %V 258 %X Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB (\textsf{RLinUCB}) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (\textsf{HR-Bandit}), which integrates human expertise to enhance performance. \textsf{HR-Bandit} offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.
APA
Cao, J., Gao, R. & Keyvanshokooh, E.. (2025). HR-Bandit: Human-AI Collaborated Linear Recourse Bandit. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:3016-3024 Available from https://proceedings.mlr.press/v258/cao25a.html.

Related Material