Causal Bandits with Propagating Inference

Akihiro Yabe; Daisuke Hatano; Hanna Sumita; Shinji Ito; Naonori Kakimura; Takuro Fukunaga; Ken-ichi Kawarabayashi

Causal Bandits with Propagating Inference

Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, Ken-ichi Kawarabayashi

Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5512-5520, 2018.

Abstract

Bandit is a framework for designing sequential experiments, where a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$ in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then $\gamma^* = O(N^2)$, where $N$ is the number of nodes.

Cite this Paper

BibTeX

@InProceedings{pmlr-v80-yabe18a,
  title = 	 {Causal Bandits with Propagating Inference},
  author =       {Yabe, Akihiro and Hatano, Daisuke and Sumita, Hanna and Ito, Shinji and Kakimura, Naonori and Fukunaga, Takuro and Kawarabayashi, Ken-ichi},
  booktitle = 	 {Proceedings of the 35th International Conference on Machine Learning},
  pages = 	 {5512--5520},
  year = 	 {2018},
  editor = 	 {Dy, Jennifer and Krause, Andreas},
  volume = 	 {80},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v80/yabe18a/yabe18a.pdf},
  url = 	 {https://proceedings.mlr.press/v80/yabe18a.html},
  abstract = 	 {Bandit is a framework for designing sequential experiments, where a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$ in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then $\gamma^* = O(N^2)$, where $N$ is the number of nodes.}
}

Endnote

%0 Conference Paper
%T Causal Bandits with Propagating Inference
%A Akihiro Yabe
%A Daisuke Hatano
%A Hanna Sumita
%A Shinji Ito
%A Naonori Kakimura
%A Takuro Fukunaga
%A Ken-ichi Kawarabayashi
%B Proceedings of the 35th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2018
%E Jennifer Dy
%E Andreas Krause	
%F pmlr-v80-yabe18a
%I PMLR
%P 5512--5520
%U https://proceedings.mlr.press/v80/yabe18a.html
%V 80
%X Bandit is a framework for designing sequential experiments, where a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$ in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the $\Omega(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ \gamma^*\log(|\mathcal{A}|T) / T})$ regret bound, where $\gamma^*$ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then $\gamma^* = O(N^2)$, where $N$ is the number of nodes.

APA

Yabe, A., Hatano, D., Sumita, H., Ito, S., Kakimura, N., Fukunaga, T. & Kawarabayashi, K.. (2018). Causal Bandits with Propagating Inference. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:5512-5520 Available from https://proceedings.mlr.press/v80/yabe18a.html.

Related Material

Download PDF