When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment

Feng Zhu, Zeyu Zheng
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:11629-11638, 2020.

Abstract

We consider a single-product dynamic pricing problem under a specific non-stationary setting, where the underlying demand process grows over time in expectation and also possibly in the level of random fluctuation. The decision maker sequentially sets price in each time period and learns the unknown demand model, with the goal of maximizing expected cumulative revenue over a time horizon $T$. We prove matching upper and lower bounds on regret and provide near-optimal pricing policies, showing how the growth rate of random fluctuation over time affects the best achievable regret order and the near-optimal policy design. In the analysis, we show that whether the seller knows the length of time horizon $T$ in advance or not surprisingly render different optimal regret orders. We then extend the demand model such that the optimal price may vary with time and present a novel and near-optimal policy for the extended model. Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that knowing or not knowing the length of time horizon $T$ render the same optimal regret order, in contrast to the non-stationary dynamic pricing problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-zhu20g, title = {When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment}, author = {Zhu, Feng and Zheng, Zeyu}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {11629--11638}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/zhu20g/zhu20g.pdf}, url = {https://proceedings.mlr.press/v119/zhu20g.html}, abstract = {We consider a single-product dynamic pricing problem under a specific non-stationary setting, where the underlying demand process grows over time in expectation and also possibly in the level of random fluctuation. The decision maker sequentially sets price in each time period and learns the unknown demand model, with the goal of maximizing expected cumulative revenue over a time horizon $T$. We prove matching upper and lower bounds on regret and provide near-optimal pricing policies, showing how the growth rate of random fluctuation over time affects the best achievable regret order and the near-optimal policy design. In the analysis, we show that whether the seller knows the length of time horizon $T$ in advance or not surprisingly render different optimal regret orders. We then extend the demand model such that the optimal price may vary with time and present a novel and near-optimal policy for the extended model. Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that knowing or not knowing the length of time horizon $T$ render the same optimal regret order, in contrast to the non-stationary dynamic pricing problem.} }
Endnote
%0 Conference Paper %T When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment %A Feng Zhu %A Zeyu Zheng %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-zhu20g %I PMLR %P 11629--11638 %U https://proceedings.mlr.press/v119/zhu20g.html %V 119 %X We consider a single-product dynamic pricing problem under a specific non-stationary setting, where the underlying demand process grows over time in expectation and also possibly in the level of random fluctuation. The decision maker sequentially sets price in each time period and learns the unknown demand model, with the goal of maximizing expected cumulative revenue over a time horizon $T$. We prove matching upper and lower bounds on regret and provide near-optimal pricing policies, showing how the growth rate of random fluctuation over time affects the best achievable regret order and the near-optimal policy design. In the analysis, we show that whether the seller knows the length of time horizon $T$ in advance or not surprisingly render different optimal regret orders. We then extend the demand model such that the optimal price may vary with time and present a novel and near-optimal policy for the extended model. Finally, we consider an analogous non-stationary setting in the canonical multi-armed bandit problem, and points out that knowing or not knowing the length of time horizon $T$ render the same optimal regret order, in contrast to the non-stationary dynamic pricing problem.
APA
Zhu, F. & Zheng, Z.. (2020). When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:11629-11638 Available from https://proceedings.mlr.press/v119/zhu20g.html.

Related Material