Search-Based Serving Architecture of Embeddings-Based Recommendations

Sonya Liberman, Shaked Bar, Raphael Vannerom, Danny Rosenstein, Ronny Lempel
Proceedings of the 2nd Workshop on Online Recommder Systems and User Modeling, PMLR 109:12-20, 2019.

Abstract

Over the past 10 years, many recommendation techniques have been based on embedding users and items in latent vector spaces, where the inner product of a (user,item) pair of vectors represents the predicted affinity of the user to the item. A wealth of literature has focused on the various modeling approaches that result in embeddings, and has compared their quality metrics, learning complexity, etc. However, much less attention has been devoted to the issues surrounding productization of an embeddings-based high throughput, low latency recommender system. In particular, how the system might keep up with the changing embeddings as new models are learnt. This paper describes a reference architecture of a high-throughput, large scale recommendation service which leverages a search engine as its runtime core. We describe how the search index and the query builder adapt to changes in the embeddings, which often happen at a different cadence than index builds. We provide solutions for both id-based and feature-based embeddings, as well as for batch indexing and incremental indexing setups. The described system is at the core of a Web content discovery service that serves tens of billions recommendations per day in response to billions of user requests

Cite this Paper


BibTeX
@InProceedings{pmlr-v109-liberman19a, title = {Search-Based Serving Architecture of Embeddings-Based Recommendations}, author = {Liberman, Sonya and Bar, Shaked and Vannerom, Raphael and Rosenstein, Danny and Lempel, Ronny}, booktitle = {Proceedings of the 2nd Workshop on Online Recommder Systems and User Modeling}, pages = {12--20}, year = {2019}, editor = {Vinagre, João and Jorge, Alípio Mário and Bifet, Albert and Al-Ghossein, Marie}, volume = {109}, series = {Proceedings of Machine Learning Research}, month = {19 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v109/liberman19a/liberman19a.pdf}, url = {https://proceedings.mlr.press/v109/liberman19a.html}, abstract = {Over the past 10 years, many recommendation techniques have been based on embedding users and items in latent vector spaces, where the inner product of a (user,item) pair of vectors represents the predicted affinity of the user to the item. A wealth of literature has focused on the various modeling approaches that result in embeddings, and has compared their quality metrics, learning complexity, etc. However, much less attention has been devoted to the issues surrounding productization of an embeddings-based high throughput, low latency recommender system. In particular, how the system might keep up with the changing embeddings as new models are learnt. This paper describes a reference architecture of a high-throughput, large scale recommendation service which leverages a search engine as its runtime core. We describe how the search index and the query builder adapt to changes in the embeddings, which often happen at a different cadence than index builds. We provide solutions for both id-based and feature-based embeddings, as well as for batch indexing and incremental indexing setups. The described system is at the core of a Web content discovery service that serves tens of billions recommendations per day in response to billions of user requests} }
Endnote
%0 Conference Paper %T Search-Based Serving Architecture of Embeddings-Based Recommendations %A Sonya Liberman %A Shaked Bar %A Raphael Vannerom %A Danny Rosenstein %A Ronny Lempel %B Proceedings of the 2nd Workshop on Online Recommder Systems and User Modeling %C Proceedings of Machine Learning Research %D 2019 %E João Vinagre %E Alípio Mário Jorge %E Albert Bifet %E Marie Al-Ghossein %F pmlr-v109-liberman19a %I PMLR %P 12--20 %U https://proceedings.mlr.press/v109/liberman19a.html %V 109 %X Over the past 10 years, many recommendation techniques have been based on embedding users and items in latent vector spaces, where the inner product of a (user,item) pair of vectors represents the predicted affinity of the user to the item. A wealth of literature has focused on the various modeling approaches that result in embeddings, and has compared their quality metrics, learning complexity, etc. However, much less attention has been devoted to the issues surrounding productization of an embeddings-based high throughput, low latency recommender system. In particular, how the system might keep up with the changing embeddings as new models are learnt. This paper describes a reference architecture of a high-throughput, large scale recommendation service which leverages a search engine as its runtime core. We describe how the search index and the query builder adapt to changes in the embeddings, which often happen at a different cadence than index builds. We provide solutions for both id-based and feature-based embeddings, as well as for batch indexing and incremental indexing setups. The described system is at the core of a Web content discovery service that serves tens of billions recommendations per day in response to billions of user requests
APA
Liberman, S., Bar, S., Vannerom, R., Rosenstein, D. & Lempel, R.. (2019). Search-Based Serving Architecture of Embeddings-Based Recommendations. Proceedings of the 2nd Workshop on Online Recommder Systems and User Modeling, in Proceedings of Machine Learning Research 109:12-20 Available from https://proceedings.mlr.press/v109/liberman19a.html.

Related Material