Mem-Rec: Memory Efficient Recommendation System using Alternative Representation

Gopi Krishna Jha, Anthony Thomas, Nilesh Jain, Sameh Gobriel, Tajana Rosing, Ravi Iyer
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:518-533, 2024.

Abstract

Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems commonly includes categorical features taking on tens-of-millions of possible distinct values. These categorical tokens are typically assigned learned vector representations, that are stored in large embedding tables, on the order of 100s of GB. Storing and accessing these tables represent a substantial performance burden. Our work proposes MEM-REC, a novel alternative representation approach for embedding tables. MEM-REC leverages Bloom filters and hashing methods to encode categorical features using two cache-friendly embedding tables. The first table (token embedding) contains raw embeddings (i.e. learned vector representation), and the second table (weight embedding), which is much smaller, contains weights to scale these raw embeddings to provide better discriminative capability to each data point. We provide a detailed architecture, design and analysis of MEM-REC addressing trade-offs in accuracy and computation requirements. In comparison with state-of-the-art techniques MEM-REC can not only maintain the recommendation quality and significantly reduce the memory footprint for commercial scale recommendation models but can also improve the embedding latency. In particular, based on our results, MEM-REC compresses the MLPerf CriteoTB benchmark DLRM model size by $2900\times$ and performs up to $3.4\times$ faster embeddings while achieving the same AUC as that of the full uncompressed model.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-jha24a, title = {{Mem-Rec}: {M}emory Efficient Recommendation System using Alternative Representation}, author = {Jha, Gopi Krishna and Thomas, Anthony and Jain, Nilesh and Gobriel, Sameh and Rosing, Tajana and Iyer, Ravi}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {518--533}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/jha24a/jha24a.pdf}, url = {https://proceedings.mlr.press/v222/jha24a.html}, abstract = {Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems commonly includes categorical features taking on tens-of-millions of possible distinct values. These categorical tokens are typically assigned learned vector representations, that are stored in large embedding tables, on the order of 100s of GB. Storing and accessing these tables represent a substantial performance burden. Our work proposes MEM-REC, a novel alternative representation approach for embedding tables. MEM-REC leverages Bloom filters and hashing methods to encode categorical features using two cache-friendly embedding tables. The first table (token embedding) contains raw embeddings (i.e. learned vector representation), and the second table (weight embedding), which is much smaller, contains weights to scale these raw embeddings to provide better discriminative capability to each data point. We provide a detailed architecture, design and analysis of MEM-REC addressing trade-offs in accuracy and computation requirements. In comparison with state-of-the-art techniques MEM-REC can not only maintain the recommendation quality and significantly reduce the memory footprint for commercial scale recommendation models but can also improve the embedding latency. In particular, based on our results, MEM-REC compresses the MLPerf CriteoTB benchmark DLRM model size by $2900\times$ and performs up to $3.4\times$ faster embeddings while achieving the same AUC as that of the full uncompressed model.} }
Endnote
%0 Conference Paper %T Mem-Rec: Memory Efficient Recommendation System using Alternative Representation %A Gopi Krishna Jha %A Anthony Thomas %A Nilesh Jain %A Sameh Gobriel %A Tajana Rosing %A Ravi Iyer %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-jha24a %I PMLR %P 518--533 %U https://proceedings.mlr.press/v222/jha24a.html %V 222 %X Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems commonly includes categorical features taking on tens-of-millions of possible distinct values. These categorical tokens are typically assigned learned vector representations, that are stored in large embedding tables, on the order of 100s of GB. Storing and accessing these tables represent a substantial performance burden. Our work proposes MEM-REC, a novel alternative representation approach for embedding tables. MEM-REC leverages Bloom filters and hashing methods to encode categorical features using two cache-friendly embedding tables. The first table (token embedding) contains raw embeddings (i.e. learned vector representation), and the second table (weight embedding), which is much smaller, contains weights to scale these raw embeddings to provide better discriminative capability to each data point. We provide a detailed architecture, design and analysis of MEM-REC addressing trade-offs in accuracy and computation requirements. In comparison with state-of-the-art techniques MEM-REC can not only maintain the recommendation quality and significantly reduce the memory footprint for commercial scale recommendation models but can also improve the embedding latency. In particular, based on our results, MEM-REC compresses the MLPerf CriteoTB benchmark DLRM model size by $2900\times$ and performs up to $3.4\times$ faster embeddings while achieving the same AUC as that of the full uncompressed model.
APA
Jha, G.K., Thomas, A., Jain, N., Gobriel, S., Rosing, T. & Iyer, R.. (2024). Mem-Rec: Memory Efficient Recommendation System using Alternative Representation. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:518-533 Available from https://proceedings.mlr.press/v222/jha24a.html.

Related Material