Causal Bandits with Propagating Inference
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:55125520, 2018.
Abstract
Bandit is a framework for designing sequential experiments, where a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$ in each experiment. Theoretically, the tight regret lowerbound for the general bandit is polynomial with respect to the number of arms $\mathcal{A}$, and thus, to overcome this bound, the bandit problem with sideinformation is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as sideinformation and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{\mathcal{A}/T})$ simpleregret lowerbound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(\mathcal{A}T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the maximum indegree of the causal graph is a constant, then $\gamma^* = O(N^2)$, where $N$ is the number of nodes.
Related Material


