BudgetConstrained Bandits over General Cost and Reward Distributions
[edit]
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:43884398, 2020.
Abstract
We consider a budgetconstrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavytailed costreward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all costreward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum meansquare error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problemdependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.
Related Material


