A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks

Nicholas Monath, Will Sussman Grathwohl, Michael Boratko, Rob Fergus, Andrew Mccallum, Manzil Zaheer
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:36072-36087, 2024.

Abstract

In dense retrieval, deep encoders provide embeddings for both inputs and targets, and the softmax function is used to parameterize a distribution over a large number of candidate targets (e.g., textual passages for information retrieval). Significant challenges arise in training such encoders in the increasingly prevalent scenario of (1) a large number of targets, (2) a computationally expensive target encoder model, (3) cached target embeddings that are out-of-date due to ongoing training of target encoder parameters. This paper presents a simple and highly scalable response to these challenges by training a small parametric corrector network that adjusts stale cached target embeddings, enabling an accurate softmax approximation and thereby sampling of up-to-date high scoring "hard negatives." We theoretically investigate the generalization properties of our proposed target corrector, relating the complexity of the network, staleness of cached representations, and the amount of training data. We present experimental results on large benchmark dense retrieval datasets as well as on QA with retrieval augmented language models. Our approach matches state-of-the-art results even when no target embedding updates are made during training beyond an initial cache from the unsupervised pre-trained model, providing a 4-80x reduction in re-embedding computational cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-monath24a, title = {A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks}, author = {Monath, Nicholas and Grathwohl, Will Sussman and Boratko, Michael and Fergus, Rob and Mccallum, Andrew and Zaheer, Manzil}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {36072--36087}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/monath24a/monath24a.pdf}, url = {https://proceedings.mlr.press/v235/monath24a.html}, abstract = {In dense retrieval, deep encoders provide embeddings for both inputs and targets, and the softmax function is used to parameterize a distribution over a large number of candidate targets (e.g., textual passages for information retrieval). Significant challenges arise in training such encoders in the increasingly prevalent scenario of (1) a large number of targets, (2) a computationally expensive target encoder model, (3) cached target embeddings that are out-of-date due to ongoing training of target encoder parameters. This paper presents a simple and highly scalable response to these challenges by training a small parametric corrector network that adjusts stale cached target embeddings, enabling an accurate softmax approximation and thereby sampling of up-to-date high scoring "hard negatives." We theoretically investigate the generalization properties of our proposed target corrector, relating the complexity of the network, staleness of cached representations, and the amount of training data. We present experimental results on large benchmark dense retrieval datasets as well as on QA with retrieval augmented language models. Our approach matches state-of-the-art results even when no target embedding updates are made during training beyond an initial cache from the unsupervised pre-trained model, providing a 4-80x reduction in re-embedding computational cost.} }
Endnote
%0 Conference Paper %T A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks %A Nicholas Monath %A Will Sussman Grathwohl %A Michael Boratko %A Rob Fergus %A Andrew Mccallum %A Manzil Zaheer %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-monath24a %I PMLR %P 36072--36087 %U https://proceedings.mlr.press/v235/monath24a.html %V 235 %X In dense retrieval, deep encoders provide embeddings for both inputs and targets, and the softmax function is used to parameterize a distribution over a large number of candidate targets (e.g., textual passages for information retrieval). Significant challenges arise in training such encoders in the increasingly prevalent scenario of (1) a large number of targets, (2) a computationally expensive target encoder model, (3) cached target embeddings that are out-of-date due to ongoing training of target encoder parameters. This paper presents a simple and highly scalable response to these challenges by training a small parametric corrector network that adjusts stale cached target embeddings, enabling an accurate softmax approximation and thereby sampling of up-to-date high scoring "hard negatives." We theoretically investigate the generalization properties of our proposed target corrector, relating the complexity of the network, staleness of cached representations, and the amount of training data. We present experimental results on large benchmark dense retrieval datasets as well as on QA with retrieval augmented language models. Our approach matches state-of-the-art results even when no target embedding updates are made during training beyond an initial cache from the unsupervised pre-trained model, providing a 4-80x reduction in re-embedding computational cost.
APA
Monath, N., Grathwohl, W.S., Boratko, M., Fergus, R., Mccallum, A. & Zaheer, M.. (2024). A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:36072-36087 Available from https://proceedings.mlr.press/v235/monath24a.html.

Related Material