In defense of dual-encoders for neural ranking

Aditya Menon; Sadeep Jayasumana; Ankit Singh Rawat; Seungyeon Kim; Sashank Reddi; Sanjiv Kumar

In defense of dual-encoders for neural ranking

Aditya Menon, Sadeep Jayasumana, Ankit Singh Rawat, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:15376-15400, 2022.

Abstract

Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query. There are two broad flavours of such models: cross-attention (CA) models, which learn a joint embedding for the query and document, and dual-encoder (DE) models, which learn separate embeddings for the query and document. Empirically, CA models are often found to be more accurate, which has motivated a series of works seeking to bridge this gap. However, a more fundamental question remains less explored: does this performance gap reflect an inherent limitation in the capacity of DE models, or a limitation in the training of such models? And does such an understanding suggest a principled means of improving DE models? In this paper, we study these questions, with three contributions. First, we establish theoretically that with a sufficiently large embedding dimension, DE models have the capacity to model a broad class of score distributions. Second, we show empirically that on real-world problems, DE models may overfit to spurious correlations in the training set, and thus under-perform on test samples. To mitigate this behaviour, we propose a suitable distillation strategy, and confirm its practical efficacy on the MSMARCO-Passage and Natural Questions benchmarks.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-menon22a,
  title = 	 {In defense of dual-encoders for neural ranking},
  author =       {Menon, Aditya and Jayasumana, Sadeep and Rawat, Ankit Singh and Kim, Seungyeon and Reddi, Sashank and Kumar, Sanjiv},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {15376--15400},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/menon22a/menon22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/menon22a.html},
  abstract = 	 {Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query. There are two broad flavours of such models: cross-attention (CA) models, which learn a joint embedding for the query and document, and dual-encoder (DE) models, which learn separate embeddings for the query and document. Empirically, CA models are often found to be more accurate, which has motivated a series of works seeking to bridge this gap. However, a more fundamental question remains less explored: does this performance gap reflect an inherent limitation in the capacity of DE models, or a limitation in the training of such models? And does such an understanding suggest a principled means of improving DE models? In this paper, we study these questions, with three contributions. First, we establish theoretically that with a sufficiently large embedding dimension, DE models have the capacity to model a broad class of score distributions. Second, we show empirically that on real-world problems, DE models may overfit to spurious correlations in the training set, and thus under-perform on test samples. To mitigate this behaviour, we propose a suitable distillation strategy, and confirm its practical efficacy on the MSMARCO-Passage and Natural Questions benchmarks.}
}

Endnote

%0 Conference Paper
%T In defense of dual-encoders for neural ranking
%A Aditya Menon
%A Sadeep Jayasumana
%A Ankit Singh Rawat
%A Seungyeon Kim
%A Sashank Reddi
%A Sanjiv Kumar
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-menon22a
%I PMLR
%P 15376--15400
%U https://proceedings.mlr.press/v162/menon22a.html
%V 162
%X Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query. There are two broad flavours of such models: cross-attention (CA) models, which learn a joint embedding for the query and document, and dual-encoder (DE) models, which learn separate embeddings for the query and document. Empirically, CA models are often found to be more accurate, which has motivated a series of works seeking to bridge this gap. However, a more fundamental question remains less explored: does this performance gap reflect an inherent limitation in the capacity of DE models, or a limitation in the training of such models? And does such an understanding suggest a principled means of improving DE models? In this paper, we study these questions, with three contributions. First, we establish theoretically that with a sufficiently large embedding dimension, DE models have the capacity to model a broad class of score distributions. Second, we show empirically that on real-world problems, DE models may overfit to spurious correlations in the training set, and thus under-perform on test samples. To mitigate this behaviour, we propose a suitable distillation strategy, and confirm its practical efficacy on the MSMARCO-Passage and Natural Questions benchmarks.

APA


Menon, A., Jayasumana, S., Rawat, A.S., Kim, S., Reddi, S. & Kumar, S.. (2022). In defense of dual-encoders for neural ranking. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:15376-15400 Available from https://proceedings.mlr.press/v162/menon22a.html.

Related Material

Download PDF