Finding the bandit in a graph: Sequential searchandstop
[edit]
Proceedings of Machine Learning Research, PMLR 89:16681677, 2019.
Abstract
We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose inneighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multiarmed bandits. If the distribution is known, we provide a quasioptimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act nearoptimally in order to collect as many hidden objects as possible.
Related Material


