Power of Hints for Online Learning with Movement Costs

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2818-2826, 2021.

Abstract

We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors $c_t$ with points $x_t$ in order to maintain low regret, but is also penalized for movement by an additional cost $\|x_t-x_{t+1}\|^{1+\epsilon}$ for some $\epsilon>0$. Classically, simple algorithms that obtain the optimal $\sqrt{T}$ regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak “hint” vectors that have a positive correlation with the costs, the regret can be significantly improved to $\log(T)$. In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs between $\log(T)$ when $\epsilon\ge 1$ and $\sqrt{T}$ regret when $\epsilon=0$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-bhaskara21b, title = { Power of Hints for Online Learning with Movement Costs }, author = {Bhaskara, Aditya and Cutkosky, Ashok and Kumar, Ravi and Purohit, Manish}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2818--2826}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/bhaskara21b/bhaskara21b.pdf}, url = {https://proceedings.mlr.press/v130/bhaskara21b.html}, abstract = { We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors $c_t$ with points $x_t$ in order to maintain low regret, but is also penalized for movement by an additional cost $\|x_t-x_{t+1}\|^{1+\epsilon}$ for some $\epsilon>0$. Classically, simple algorithms that obtain the optimal $\sqrt{T}$ regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak “hint” vectors that have a positive correlation with the costs, the regret can be significantly improved to $\log(T)$. In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs between $\log(T)$ when $\epsilon\ge 1$ and $\sqrt{T}$ regret when $\epsilon=0$. } }
Endnote
%0 Conference Paper %T Power of Hints for Online Learning with Movement Costs %A Aditya Bhaskara %A Ashok Cutkosky %A Ravi Kumar %A Manish Purohit %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-bhaskara21b %I PMLR %P 2818--2826 %U https://proceedings.mlr.press/v130/bhaskara21b.html %V 130 %X We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors $c_t$ with points $x_t$ in order to maintain low regret, but is also penalized for movement by an additional cost $\|x_t-x_{t+1}\|^{1+\epsilon}$ for some $\epsilon>0$. Classically, simple algorithms that obtain the optimal $\sqrt{T}$ regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak “hint” vectors that have a positive correlation with the costs, the regret can be significantly improved to $\log(T)$. In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs between $\log(T)$ when $\epsilon\ge 1$ and $\sqrt{T}$ regret when $\epsilon=0$.
APA
Bhaskara, A., Cutkosky, A., Kumar, R. & Purohit, M.. (2021). Power of Hints for Online Learning with Movement Costs . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2818-2826 Available from https://proceedings.mlr.press/v130/bhaskara21b.html.

Related Material