Power of Hints for Online Learning with Movement Costs

Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit

Power of Hints for Online Learning with Movement Costs

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2818-2826, 2021.

Abstract

We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors

$c_t$ with points

$x_t$ in order to maintain low regret, but is also penalized for movement by an additional cost

$\|x_t-x_{t+1}\|^{1+\epsilon}$ for some

$\epsilon>0$ . Classically, simple algorithms that obtain the optimal

$\sqrt{T}$ regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak “hint” vectors that have a positive correlation with the costs, the regret can be significantly improved to

$\log(T)$ . In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs between

$\log(T)$ when

$\epsilon\ge 1$ and

$\sqrt{T}$ regret when

$\epsilon=0$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v130-bhaskara21b,
  title = 	 { Power of Hints for Online Learning with Movement Costs },
  author =       {Bhaskara, Aditya and Cutkosky, Ashok and Kumar, Ravi and Purohit, Manish},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2818--2826},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/bhaskara21b/bhaskara21b.pdf},
  url = 	 {https://proceedings.mlr.press/v130/bhaskara21b.html},
  abstract = 	 { We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors $c_t$ with points $x_t$ in order to maintain low regret, but is also penalized for movement by an additional cost $\|x_t-x_{t+1}\|^{1+\epsilon}$ for some $\epsilon>0$. Classically, simple algorithms that obtain the optimal $\sqrt{T}$ regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak “hint” vectors that have a positive correlation with the costs, the regret can be significantly improved to $\log(T)$. In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs between $\log(T)$ when $\epsilon\ge 1$ and $\sqrt{T}$ regret when $\epsilon=0$. }
}

Endnote

%0 Conference Paper
%T  Power of Hints for Online Learning with Movement Costs 
%A Aditya Bhaskara
%A Ashok Cutkosky
%A Ravi Kumar
%A Manish Purohit
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-bhaskara21b
%I PMLR
%P 2818--2826
%U https://proceedings.mlr.press/v130/bhaskara21b.html
%V 130
%X  We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors $c_t$ with points $x_t$ in order to maintain low regret, but is also penalized for movement by an additional cost $\|x_t-x_{t+1}\|^{1+\epsilon}$ for some $\epsilon>0$. Classically, simple algorithms that obtain the optimal $\sqrt{T}$ regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak “hint” vectors that have a positive correlation with the costs, the regret can be significantly improved to $\log(T)$. In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs between $\log(T)$ when $\epsilon\ge 1$ and $\sqrt{T}$ regret when $\epsilon=0$.

APA


Bhaskara, A., Cutkosky, A., Kumar, R. & Purohit, M.. (2021).  Power of Hints for Online Learning with Movement Costs . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2818-2826 Available from https://proceedings.mlr.press/v130/bhaskara21b.html.

Related Material

Download PDF