Expert with clustering: Hierarchical online preference learning framework

Tianyue Zhou, Jung-Hoon Cho, Babak Rahimi Ardabili, Hamed Tabkhi, Cathy Wu
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:707-718, 2024.

Abstract

Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we thus consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users within a population. We therefore introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids, thereby enhancing the performance of recommendation systems. In a recommendation scenario with $N${users}, $T${rounds} per user, and $K${options}, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. The algorithm performs with low regret, especially when a latent hierarchical structure exists among users. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-zhou24a, title = {Expert with Clustering: {H}ierarchical Online Preference Learning Framework}, author = {Zhou, Tianyue and Cho, Jung-Hoon and Ardabili, Babak Rahimi and Tabkhi, Hamed and Wu, Cathy}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {707--718}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/zhou24a/zhou24a.pdf}, url = {https://proceedings.mlr.press/v242/zhou24a.html}, abstract = {Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we thus consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users within a population. We therefore introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids, thereby enhancing the performance of recommendation systems. In a recommendation scenario with $N${users}, $T${rounds} per user, and $K${options}, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. The algorithm performs with low regret, especially when a latent hierarchical structure exists among users. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.} }
Endnote
%0 Conference Paper %T Expert with clustering: Hierarchical online preference learning framework %A Tianyue Zhou %A Jung-Hoon Cho %A Babak Rahimi Ardabili %A Hamed Tabkhi %A Cathy Wu %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-zhou24a %I PMLR %P 707--718 %U https://proceedings.mlr.press/v242/zhou24a.html %V 242 %X Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we thus consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users within a population. We therefore introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids, thereby enhancing the performance of recommendation systems. In a recommendation scenario with $N${users}, $T${rounds} per user, and $K${options}, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. The algorithm performs with low regret, especially when a latent hierarchical structure exists among users. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.
APA
Zhou, T., Cho, J., Ardabili, B.R., Tabkhi, H. & Wu, C.. (2024). Expert with clustering: Hierarchical online preference learning framework. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:707-718 Available from https://proceedings.mlr.press/v242/zhou24a.html.

Related Material