Finding the bandit in a graph: Sequential search-and-stop

Pierre Perrault; Vianney Perchet; Michal Valko

Finding the bandit in a graph: Sequential search-and-stop

Pierre Perrault, Vianney Perchet, Michal Valko

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1668-1677, 2019.

Abstract

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-perrault19a,
  title = 	 {Finding the bandit in a graph: Sequential search-and-stop},
  author =       {Perrault, Pierre and Perchet, Vianney and Valko, Michal},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1668--1677},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/perrault19a/perrault19a.pdf},
  url = 	 {https://proceedings.mlr.press/v89/perrault19a.html},
  abstract = 	 {We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it.  Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.}
}

Endnote

%0 Conference Paper
%T Finding the bandit in a graph: Sequential search-and-stop
%A Pierre Perrault
%A Vianney Perchet
%A Michal Valko
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-perrault19a
%I PMLR
%P 1668--1677
%U https://proceedings.mlr.press/v89/perrault19a.html
%V 89
%X We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it.  Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.

APA


Perrault, P., Perchet, V. & Valko, M.. (2019). Finding the bandit in a graph: Sequential search-and-stop. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1668-1677 Available from https://proceedings.mlr.press/v89/perrault19a.html.

Finding the bandit in a graph: Sequential search-and-stop

Abstract

Cite this Paper

Related Material