Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem

Junyu Cao, Wei Sun
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:912-920, 2019.

Abstract

Motivated by the phenomenon that companies introduce new products to keep abreast with customers’ rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers’ preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers’ behavior when product recommendations are presented in tiers. For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-cao19a, title = {Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem}, author = {Cao, Junyu and Sun, Wei}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {912--920}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/cao19a/cao19a.pdf}, url = {https://proceedings.mlr.press/v97/cao19a.html}, abstract = {Motivated by the phenomenon that companies introduce new products to keep abreast with customers’ rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers’ preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers’ behavior when product recommendations are presented in tiers. For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.} }
Endnote
%0 Conference Paper %T Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem %A Junyu Cao %A Wei Sun %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-cao19a %I PMLR %P 912--920 %U https://proceedings.mlr.press/v97/cao19a.html %V 97 %X Motivated by the phenomenon that companies introduce new products to keep abreast with customers’ rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers’ preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers’ behavior when product recommendations are presented in tiers. For the offline version with known customers’ preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.
APA
Cao, J. & Sun, W.. (2019). Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:912-920 Available from https://proceedings.mlr.press/v97/cao19a.html.

Related Material