USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval

Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Wittawat Jitkrittum, Veeranjaneyulu Sadhanala, Sadeep Jayasumana, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:24486-24508, 2024.

Abstract

Modern information retrieval (IR) systems consists of multiple stages like retrieval and ranking, with Transformer-based models achieving state-of-the-art performance at each stage. In this paper, we challenge the tradition of using separate models for different stages and ask if a single Transformer encoder can provide relevance score needed in each stage. We present USTAD – a new unified approach to train a single network that can provide powerful ranking scores as a cross-encoder (CE) model as well as factorized embeddings for large-scale retrieval as a dual-encoder (DE) model. Empirically, we find a single USTAD model to be competitive to separate ranking CE and retrieval DE models. Furthermore, USTAD combines well with a novel embedding matching-based distillation, significantly improving CE to DE distillation. It further motivates novel asymmetric architectures for student models to ensure a better embedding alignment between the student and the teacher while ensuring small online inference cost. On standard benchmarks like MSMARCO, we demonstrate that USTAD with our proposed distillation method leads to asymmetric students with only 1/10th trainable parameter but retaining 95-97% of the teacher performance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kim24ad, title = {{USTAD}: Unified Single-model Training Achieving Diverse Scores for Information Retrieval}, author = {Kim, Seungyeon and Rawat, Ankit Singh and Zaheer, Manzil and Jitkrittum, Wittawat and Sadhanala, Veeranjaneyulu and Jayasumana, Sadeep and Menon, Aditya Krishna and Fergus, Rob and Kumar, Sanjiv}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {24486--24508}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kim24ad/kim24ad.pdf}, url = {https://proceedings.mlr.press/v235/kim24ad.html}, abstract = {Modern information retrieval (IR) systems consists of multiple stages like retrieval and ranking, with Transformer-based models achieving state-of-the-art performance at each stage. In this paper, we challenge the tradition of using separate models for different stages and ask if a single Transformer encoder can provide relevance score needed in each stage. We present USTAD – a new unified approach to train a single network that can provide powerful ranking scores as a cross-encoder (CE) model as well as factorized embeddings for large-scale retrieval as a dual-encoder (DE) model. Empirically, we find a single USTAD model to be competitive to separate ranking CE and retrieval DE models. Furthermore, USTAD combines well with a novel embedding matching-based distillation, significantly improving CE to DE distillation. It further motivates novel asymmetric architectures for student models to ensure a better embedding alignment between the student and the teacher while ensuring small online inference cost. On standard benchmarks like MSMARCO, we demonstrate that USTAD with our proposed distillation method leads to asymmetric students with only 1/10th trainable parameter but retaining 95-97% of the teacher performance.} }
Endnote
%0 Conference Paper %T USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval %A Seungyeon Kim %A Ankit Singh Rawat %A Manzil Zaheer %A Wittawat Jitkrittum %A Veeranjaneyulu Sadhanala %A Sadeep Jayasumana %A Aditya Krishna Menon %A Rob Fergus %A Sanjiv Kumar %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kim24ad %I PMLR %P 24486--24508 %U https://proceedings.mlr.press/v235/kim24ad.html %V 235 %X Modern information retrieval (IR) systems consists of multiple stages like retrieval and ranking, with Transformer-based models achieving state-of-the-art performance at each stage. In this paper, we challenge the tradition of using separate models for different stages and ask if a single Transformer encoder can provide relevance score needed in each stage. We present USTAD – a new unified approach to train a single network that can provide powerful ranking scores as a cross-encoder (CE) model as well as factorized embeddings for large-scale retrieval as a dual-encoder (DE) model. Empirically, we find a single USTAD model to be competitive to separate ranking CE and retrieval DE models. Furthermore, USTAD combines well with a novel embedding matching-based distillation, significantly improving CE to DE distillation. It further motivates novel asymmetric architectures for student models to ensure a better embedding alignment between the student and the teacher while ensuring small online inference cost. On standard benchmarks like MSMARCO, we demonstrate that USTAD with our proposed distillation method leads to asymmetric students with only 1/10th trainable parameter but retaining 95-97% of the teacher performance.
APA
Kim, S., Rawat, A.S., Zaheer, M., Jitkrittum, W., Sadhanala, V., Jayasumana, S., Menon, A.K., Fergus, R. & Kumar, S.. (2024). USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:24486-24508 Available from https://proceedings.mlr.press/v235/kim24ad.html.

Related Material