Bandits with Movement Costs and Adaptive Pricing

Tomer Koren, Roi Livni, Yishay Mansour
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1242-1268, 2017.

Abstract

We extend the model of Multi-Armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval $[0,1]$ and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{k}T + T/k)$, where $k$ is the number of actions and $T$ is the time horizon. When the set of actions corresponds to whole $[0,1]$ interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of $\widetilde{Θ}(T^2/3)$, which is the same rate one obtains when there is no penalty for movements. As our main application, we use our new algorithm to solve an adaptive pricing problem. Specifically, we consider the case of a single seller faced with a stream of patient buyers. Each buyer has a private value and a window of time in which they are interested in buying, and they buy at the lowest price in the window, if it is below their value. We show that with an appropriate discretization of the prices, the seller can achieve a regret of $\widetilde{O}(T^2/3)$ compared to the best fixed price in hindsight, which outperform the previous regret bound of $\widetilde{O}(T^3/4)$ for the problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v65-koren17a, title = {Bandits with Movement Costs and Adaptive Pricing}, author = {Koren, Tomer and Livni, Roi and Mansour, Yishay}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {1242--1268}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/koren17a/koren17a.pdf}, url = {https://proceedings.mlr.press/v65/koren17a.html}, abstract = {We extend the model of Multi-Armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval $[0,1]$ and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{k}T + T/k)$, where $k$ is the number of actions and $T$ is the time horizon. When the set of actions corresponds to whole $[0,1]$ interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of $\widetilde{Θ}(T^2/3)$, which is the same rate one obtains when there is no penalty for movements. As our main application, we use our new algorithm to solve an adaptive pricing problem. Specifically, we consider the case of a single seller faced with a stream of patient buyers. Each buyer has a private value and a window of time in which they are interested in buying, and they buy at the lowest price in the window, if it is below their value. We show that with an appropriate discretization of the prices, the seller can achieve a regret of $\widetilde{O}(T^2/3)$ compared to the best fixed price in hindsight, which outperform the previous regret bound of $\widetilde{O}(T^3/4)$ for the problem. } }
Endnote
%0 Conference Paper %T Bandits with Movement Costs and Adaptive Pricing %A Tomer Koren %A Roi Livni %A Yishay Mansour %B Proceedings of the 2017 Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2017 %E Satyen Kale %E Ohad Shamir %F pmlr-v65-koren17a %I PMLR %P 1242--1268 %U https://proceedings.mlr.press/v65/koren17a.html %V 65 %X We extend the model of Multi-Armed Bandit with unit switching cost to incorporate a metric between the actions. We consider the case where the metric over the actions can be modeled by a complete binary tree, and the distance between two leaves is the size of the subtree of their least common ancestor, which abstracts the case that the actions are points on the continuous interval $[0,1]$ and the switching cost is their distance. In this setting, we give a new algorithm that establishes a regret of $\widetilde{O}(\sqrt{k}T + T/k)$, where $k$ is the number of actions and $T$ is the time horizon. When the set of actions corresponds to whole $[0,1]$ interval we can exploit our method for the task of bandit learning with Lipschitz loss functions, where our algorithm achieves an optimal regret rate of $\widetilde{Θ}(T^2/3)$, which is the same rate one obtains when there is no penalty for movements. As our main application, we use our new algorithm to solve an adaptive pricing problem. Specifically, we consider the case of a single seller faced with a stream of patient buyers. Each buyer has a private value and a window of time in which they are interested in buying, and they buy at the lowest price in the window, if it is below their value. We show that with an appropriate discretization of the prices, the seller can achieve a regret of $\widetilde{O}(T^2/3)$ compared to the best fixed price in hindsight, which outperform the previous regret bound of $\widetilde{O}(T^3/4)$ for the problem.
APA
Koren, T., Livni, R. & Mansour, Y.. (2017). Bandits with Movement Costs and Adaptive Pricing. Proceedings of the 2017 Conference on Learning Theory, in Proceedings of Machine Learning Research 65:1242-1268 Available from https://proceedings.mlr.press/v65/koren17a.html.

Related Material