[edit]
Thresholded linear bandits
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:6968-7020, 2023.
Abstract
We introduce the thresholded linear bandit problem, a novel sequential decision making problem at the interface of structured stochastic multi-armed bandits and learning halfspaces. The set of arms is [0,1]d, the expected Bernoulli reward is piecewise constant with a jump at a separating hyperplane, and each arm is associated with a cost that is a positive linear combination of the arm’s components. This problem is motivated by several practical applications. For instance, imagine tuning the continuous features of an offer to a consumer; higher values incur higher cost to the vendor but result in a more attractive offer. At some threshold, the offer is attractive enough for a random consumer to accept at the higher probability level. For the one-dimensional case, we present Leftist, which enjoys log2T problem-dependent regret in favorable cases and has log(T)√T worst-case regret; we also give a lower bound suggesting this is unimprovable. We then present MD-Leftist, our extension of Leftist to the multi-dimensional case, which obtains similar regret bounds but with d2.5logd and d1.5logd dependence on dimension for the two types of bounds respectively. Finally, we experimentally evaluate Leftist.