[edit]
A causal bandit approach to learning good atomic interventions in presence of unobserved confounders
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1328-1338, 2022.
Abstract
We study the problem of determining the best atomic intervention in a Causal Bayesian Network (CBN) specified only by its causal graph. We model this as a stochastic multi-armed bandit (MAB) problem with side-information, where interventions on CBN correspond to arms of the bandit instance. First, we propose a simple regret minimization algorithm that takes as input a causal graph with observable and unobservable nodes and in T exploration rounds achieves ˜O(√m(C)/T) expected simple regret. Here m(C) is a parameter dependent on the input CBN C and could be much smaller than the number of arms. We also show that this is almost optimal for CBNs whose causal graphs have an n-ary tree structure. Next, we propose a cumulative regret minimization algorithm that takes as input a causal graph with observable nodes and performs better than the optimal MAB algorithms that do not use causal side-information. We experimentally compare both our algorithms with the best known algorithms in the literature.