The KNearest Neighbour UCB Algorithm for MultiArmed Bandits with Covariates
[edit]
Proceedings of Algorithmic Learning Theory, PMLR 83:725752, 2018.
Abstract
In this paper we propose and explore the $k$Nearest Neighbour UCB algorithm for multiarmed bandits with covariates. We focus on a setting where covariates are supported on a metric space of low intrinsic dimension, such as a manifold embedded within a high dimensional ambient feature space. The algorithm is conceptually simple and straightforward to implement. Unlike previous methods such as the UCBogram and Adaptively Binned Successive Elimination, the $k$Nearest Neighbour UCB algorithm does not require prior knowledge of the intrinsic dimension of the marginal distribution. It is also naturally anytime, without resorting to the doubling trick. We prove a regret bound for the $k$Nearest Neighbour UCB algorithm which is minimax optimal up to logarithmic factors. In particular, the algorithm automatically takes advantage of both low intrinsic dimensionality of the marginal distribution over the covariates and low noise in the data, expressed as a margin condition. In addition, focusing on the case of bounded rewards, we give corresponding regret bounds for the $k$Nearest Neighbour KLUCB algorithm, which is an analogue of the KLUCB algorithm adapted to the setting of multiarmed bandits with covariates. Finally, we present empirical results which demonstrate the ability of both the $k$Nearest Neighbour UCB and $k$Nearest Neighbour KLUCB to take advantage of situations where the data is supported on an unknown submanifold of a highdimensional feature space.
Related Material


