Online Learning with Vector Costs and Bandits with Knapsacks

Thomas Kesselheim; Sahil Singla

Online Learning with Vector Costs and Bandits with Knapsacks

Thomas Kesselheim, Sahil Singla

Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:2286-2305, 2020.

Abstract

We introduce online learning with vector costs (

$OLVC_p$ ) where in each time step

$t \in \{1,\ldots, T\}$ , we need to play an action

$i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in

$[0,1]^d$ . The goal of the online algorithm is to minimize the

$\ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for

$d=1$ , and is interesting for general

$d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study

$OLVC_p$ in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from

$d$ dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight

$O(\min\{p, \log d\})$ competitive ratio for adversarial arrivals. The

$OLVC_p$ problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (BWK) problem. This connection allows us to use our

$OLVC_p$ techniques to obtain (near) optimal results for BWK in both stochastic and adversarial settings. In particular, we obtain a tight

$O(\log d \cdot \log T)$ competitive ratio algorithm for adversarial BWK, which improves over the

$O(d \cdot \log T)$ competitive ratio algorithm of Immorlica et al. (2019).

Cite this Paper

BibTeX


@InProceedings{pmlr-v125-kesselheim20a,
  title = 	 {Online Learning with Vector Costs and Bandits with Knapsacks},
  author =       {Kesselheim, Thomas and Singla, Sahil},
  booktitle = 	 {Proceedings of Thirty Third Conference on Learning Theory},
  pages = 	 {2286--2305},
  year = 	 {2020},
  editor = 	 {Abernethy, Jacob and Agarwal, Shivani},
  volume = 	 {125},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v125/kesselheim20a/kesselheim20a.pdf},
  url = 	 {https://proceedings.mlr.press/v125/kesselheim20a.html},
  abstract = 	 { We introduce online learning with vector costs ($OLVC_p$)  where in each time step $t \in \{1,\ldots, T\}$, we need to play an action $i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in $[0,1]^d$. The goal of the online algorithm is to minimize the $\ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for $d=1$, and is interesting for general $d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study $OLVC_p$ in both stochastic and adversarial arrival settings, and give a general procedure to reduce the  problem from $d$ dimensions to  a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm   (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight $O(\min\{p, \log d\})$ competitive ratio for adversarial arrivals. The $OLVC_p$ problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (BWK) problem. This connection allows us to use our $OLVC_p$ techniques to obtain (near) optimal results for BWK in both stochastic and adversarial settings. In particular, we obtain a tight $O(\log d \cdot \log T)$ competitive ratio algorithm for adversarial BWK, which improves over the $O(d \cdot \log T)$ competitive ratio algorithm  of Immorlica et al. (2019).}
}

Endnote

%0 Conference Paper
%T Online Learning with Vector Costs and Bandits with Knapsacks
%A Thomas Kesselheim
%A Sahil Singla
%B Proceedings of Thirty Third Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2020
%E Jacob Abernethy
%E Shivani Agarwal	
%F pmlr-v125-kesselheim20a
%I PMLR
%P 2286--2305
%U https://proceedings.mlr.press/v125/kesselheim20a.html
%V 125
%X  We introduce online learning with vector costs ($OLVC_p$)  where in each time step $t \in \{1,\ldots, T\}$, we need to play an action $i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in $[0,1]^d$. The goal of the online algorithm is to minimize the $\ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for $d=1$, and is interesting for general $d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study $OLVC_p$ in both stochastic and adversarial arrival settings, and give a general procedure to reduce the  problem from $d$ dimensions to  a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm   (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight $O(\min\{p, \log d\})$ competitive ratio for adversarial arrivals. The $OLVC_p$ problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (BWK) problem. This connection allows us to use our $OLVC_p$ techniques to obtain (near) optimal results for BWK in both stochastic and adversarial settings. In particular, we obtain a tight $O(\log d \cdot \log T)$ competitive ratio algorithm for adversarial BWK, which improves over the $O(d \cdot \log T)$ competitive ratio algorithm  of Immorlica et al. (2019).

APA


Kesselheim, T. & Singla, S.. (2020). Online Learning with Vector Costs and Bandits with Knapsacks. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:2286-2305 Available from https://proceedings.mlr.press/v125/kesselheim20a.html.

Related Material

Download PDF