Learning from a Learning User for Optimal Recommendations

Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, Haifeng Xu
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:25382-25406, 2022.

Abstract

In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such ”learning users” and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user’s learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-yao22a, title = {Learning from a Learning User for Optimal Recommendations}, author = {Yao, Fan and Li, Chuanhao and Nekipelov, Denis and Wang, Hongning and Xu, Haifeng}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {25382--25406}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/yao22a/yao22a.pdf}, url = {https://proceedings.mlr.press/v162/yao22a.html}, abstract = {In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such ”learning users” and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user’s learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.} }
Endnote
%0 Conference Paper %T Learning from a Learning User for Optimal Recommendations %A Fan Yao %A Chuanhao Li %A Denis Nekipelov %A Hongning Wang %A Haifeng Xu %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-yao22a %I PMLR %P 25382--25406 %U https://proceedings.mlr.press/v162/yao22a.html %V 162 %X In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such ”learning users” and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user’s learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
APA
Yao, F., Li, C., Nekipelov, D., Wang, H. & Xu, H.. (2022). Learning from a Learning User for Optimal Recommendations. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:25382-25406 Available from https://proceedings.mlr.press/v162/yao22a.html.

Related Material