Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem

Junyu Cao; Wei Sun

Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem

Junyu Cao, Wei Sun

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:912-920, 2019.

Abstract

Motivated by the phenomenon that companies introduce new products to keep abreast with customers’ rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers’ preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers’ behavior when product recommendations are presented in tiers. For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-cao19a,
  title = 	 {Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem},
  author =       {Cao, Junyu and Sun, Wei},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {912--920},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/cao19a/cao19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/cao19a.html},
  abstract = 	 {Motivated by the phenomenon that companies introduce new products to keep abreast with customers’ rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers’ preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers’ behavior when product recommendations are presented in tiers. For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.}
}

Endnote

%0 Conference Paper
%T Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem
%A Junyu Cao
%A Wei Sun
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-cao19a
%I PMLR
%P 912--920
%U https://proceedings.mlr.press/v97/cao19a.html
%V 97
%X Motivated by the phenomenon that companies introduce new products to keep abreast with customers’ rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers’ preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers’ behavior when product recommendations are presented in tiers. For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.

APA


Cao, J. & Sun, W.. (2019). Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:912-920 Available from https://proceedings.mlr.press/v97/cao19a.html.

Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem

Abstract

Cite this Paper

Related Material