On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation

Álvaro Labarca Silva, Denis Parra, Rodrigo Toro Icarte
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:45432-45450, 2024.

Abstract

In recent years, Reinforcement Learning (RL) has shown great promise in session-based recommendation. Sequential models that use RL have reached state-of-the-art performance for the Next-item Prediction (NIP) task. This result is intriguing, as the NIP task only evaluates how well the system can correctly recommend the next item to the user, while the goal of RL is to find a policy that optimizes rewards in the long term – sometimes at the expense of suboptimal short-term performance. Then, how can RL improve the system’s performance on short-term metrics? This article investigates this question by exploring proxy learning objectives, which we identify as goals RL models might be following, and thus could explain the performance boost. We found that RL – when used as an auxiliary loss – promotes the learning of embeddings that capture information about the user’s previously interacted items. Subsequently, we replaced the RL objective with a straightforward auxiliary loss designed to predict the number of items the user interacted with. This substitution results in performance gains comparable to RL. These findings pave the way to improve performance and understanding of RL methods for recommender systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-silva24b, title = {On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation}, author = {Silva, \'{A}lvaro Labarca and Parra, Denis and Icarte, Rodrigo Toro}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {45432--45450}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/silva24b/silva24b.pdf}, url = {https://proceedings.mlr.press/v235/silva24b.html}, abstract = {In recent years, Reinforcement Learning (RL) has shown great promise in session-based recommendation. Sequential models that use RL have reached state-of-the-art performance for the Next-item Prediction (NIP) task. This result is intriguing, as the NIP task only evaluates how well the system can correctly recommend the next item to the user, while the goal of RL is to find a policy that optimizes rewards in the long term – sometimes at the expense of suboptimal short-term performance. Then, how can RL improve the system’s performance on short-term metrics? This article investigates this question by exploring proxy learning objectives, which we identify as goals RL models might be following, and thus could explain the performance boost. We found that RL – when used as an auxiliary loss – promotes the learning of embeddings that capture information about the user’s previously interacted items. Subsequently, we replaced the RL objective with a straightforward auxiliary loss designed to predict the number of items the user interacted with. This substitution results in performance gains comparable to RL. These findings pave the way to improve performance and understanding of RL methods for recommender systems.} }
Endnote
%0 Conference Paper %T On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation %A Álvaro Labarca Silva %A Denis Parra %A Rodrigo Toro Icarte %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-silva24b %I PMLR %P 45432--45450 %U https://proceedings.mlr.press/v235/silva24b.html %V 235 %X In recent years, Reinforcement Learning (RL) has shown great promise in session-based recommendation. Sequential models that use RL have reached state-of-the-art performance for the Next-item Prediction (NIP) task. This result is intriguing, as the NIP task only evaluates how well the system can correctly recommend the next item to the user, while the goal of RL is to find a policy that optimizes rewards in the long term – sometimes at the expense of suboptimal short-term performance. Then, how can RL improve the system’s performance on short-term metrics? This article investigates this question by exploring proxy learning objectives, which we identify as goals RL models might be following, and thus could explain the performance boost. We found that RL – when used as an auxiliary loss – promotes the learning of embeddings that capture information about the user’s previously interacted items. Subsequently, we replaced the RL objective with a straightforward auxiliary loss designed to predict the number of items the user interacted with. This substitution results in performance gains comparable to RL. These findings pave the way to improve performance and understanding of RL methods for recommender systems.
APA
Silva, Á.L., Parra, D. & Icarte, R.T.. (2024). On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:45432-45450 Available from https://proceedings.mlr.press/v235/silva24b.html.

Related Material