Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:9506-9524, 2022.

Abstract

Understanding a decision-maker’s priorities by observing their behavior is critical for transparency and accountability in decision processes{—}such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits ("ICB"). Second, we propose two concrete algorithms as solutions, learning parametric and non-parametric representations of an agent’s behavior. Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating the accuracy of our algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-huyuk22a, title = {Inverse Contextual Bandits: Learning How Behavior Evolves over Time}, author = {H{\"u}y{\"u}k, Alihan and Jarrett, Daniel and van der Schaar, Mihaela}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {9506--9524}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/huyuk22a/huyuk22a.pdf}, url = {https://proceedings.mlr.press/v162/huyuk22a.html}, abstract = {Understanding a decision-maker’s priorities by observing their behavior is critical for transparency and accountability in decision processes{—}such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits ("ICB"). Second, we propose two concrete algorithms as solutions, learning parametric and non-parametric representations of an agent’s behavior. Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating the accuracy of our algorithms.} }
Endnote
%0 Conference Paper %T Inverse Contextual Bandits: Learning How Behavior Evolves over Time %A Alihan Hüyük %A Daniel Jarrett %A Mihaela van der Schaar %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-huyuk22a %I PMLR %P 9506--9524 %U https://proceedings.mlr.press/v162/huyuk22a.html %V 162 %X Understanding a decision-maker’s priorities by observing their behavior is critical for transparency and accountability in decision processes{—}such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits ("ICB"). Second, we propose two concrete algorithms as solutions, learning parametric and non-parametric representations of an agent’s behavior. Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating the accuracy of our algorithms.
APA
Hüyük, A., Jarrett, D. & van der Schaar, M.. (2022). Inverse Contextual Bandits: Learning How Behavior Evolves over Time. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:9506-9524 Available from https://proceedings.mlr.press/v162/huyuk22a.html.

Related Material