[edit]
Online Learning with Vector Costs and Bandits with Knapsacks
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:2286-2305, 2020.
Abstract
We introduce online learning with vector costs (OLVCp) where in each time step t∈{1,…,T}, we need to play an action i∈{1,…,n} that incurs an unknown vector cost in [0,1]d. The goal of the online algorithm is to minimize the ℓp norm of the sum of its cost vectors. This captures the classical online learning setting for d=1, and is interesting for general d because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study OLVCp in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from d dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight O(min competitive ratio for adversarial arrivals. The OLVC_p problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (BWK) problem. This connection allows us to use our OLVC_p techniques to obtain (near) optimal results for BWK in both stochastic and adversarial settings. In particular, we obtain a tight O(\log d \cdot \log T) competitive ratio algorithm for adversarial BWK, which improves over the O(d \cdot \log T) competitive ratio algorithm of Immorlica et al. (2019).