Causal Bandits without prior knowledge using separating sets

Arnoud De Kroon, Joris Mooij, Danielle Belgrave
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:407-427, 2022.

Abstract

The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process, where the reward distribution of the actions displays a non-trivial dependence structure that is governed by a causal model. All methods proposed for this problem thus far in the literature rely on exact prior knowledge of the full causal. We formulate new causal bandit algorithms that no longer necessarily rely on prior causal knowledge. Instead, they utilize an estimator based on separating sets, which we can find using simple conditional independence tests or causal discovery methods. We show that, for discrete i.i.d. data, this estimator is unbiased, and has variance which is upper bounded by that of the sample mean. We develop algorithms based on Thompson Sampling and UCB for discrete and Gaussian models respectively and show increased performance on simulation data as well as on a bandit drawing from real-world protein signaling data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v177-kroon22a, title = {Causal Bandits without prior knowledge using separating sets}, author = {Kroon, Arnoud De and Mooij, Joris and Belgrave, Danielle}, booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning}, pages = {407--427}, year = {2022}, editor = {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun}, volume = {177}, series = {Proceedings of Machine Learning Research}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v177/kroon22a/kroon22a.pdf}, url = {https://proceedings.mlr.press/v177/kroon22a.html}, abstract = {The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process, where the reward distribution of the actions displays a non-trivial dependence structure that is governed by a causal model. All methods proposed for this problem thus far in the literature rely on exact prior knowledge of the full causal. We formulate new causal bandit algorithms that no longer necessarily rely on prior causal knowledge. Instead, they utilize an estimator based on separating sets, which we can find using simple conditional independence tests or causal discovery methods. We show that, for discrete i.i.d. data, this estimator is unbiased, and has variance which is upper bounded by that of the sample mean. We develop algorithms based on Thompson Sampling and UCB for discrete and Gaussian models respectively and show increased performance on simulation data as well as on a bandit drawing from real-world protein signaling data.} }
Endnote
%0 Conference Paper %T Causal Bandits without prior knowledge using separating sets %A Arnoud De Kroon %A Joris Mooij %A Danielle Belgrave %B Proceedings of the First Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2022 %E Bernhard Schölkopf %E Caroline Uhler %E Kun Zhang %F pmlr-v177-kroon22a %I PMLR %P 407--427 %U https://proceedings.mlr.press/v177/kroon22a.html %V 177 %X The Causal Bandit is a variant of the classic Bandit problem where an agent must identify the best action in a sequential decision-making process, where the reward distribution of the actions displays a non-trivial dependence structure that is governed by a causal model. All methods proposed for this problem thus far in the literature rely on exact prior knowledge of the full causal. We formulate new causal bandit algorithms that no longer necessarily rely on prior causal knowledge. Instead, they utilize an estimator based on separating sets, which we can find using simple conditional independence tests or causal discovery methods. We show that, for discrete i.i.d. data, this estimator is unbiased, and has variance which is upper bounded by that of the sample mean. We develop algorithms based on Thompson Sampling and UCB for discrete and Gaussian models respectively and show increased performance on simulation data as well as on a bandit drawing from real-world protein signaling data.
APA
Kroon, A.D., Mooij, J. & Belgrave, D.. (2022). Causal Bandits without prior knowledge using separating sets. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:407-427 Available from https://proceedings.mlr.press/v177/kroon22a.html.

Related Material