Contradiction Retrieval via Contrastive Learning with Sparsity

Haike Xu, Zongyu Lin, Kai-Wei Chang, Yizhou Sun, Piotr Indyk
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69478-69506, 2025.

Abstract

Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and cross-encoder models exhibit different limitations. To address these challenges, we introduce a novel approach: SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences. Our method utilizes a combined metric of cosine similarity and a sparsity function to efficiently identify and retrieve documents that contradict a given query. This approach dramatically enhances the speed of contradiction detection by reducing the need for exhaustive document comparisons to simple vector calculations. We conduct contradiction retrieval experiments on Arguana, MSMARCO, and HotpotQA, where our method produces an average improvement of $11.0%$ across different models. We also validate our method on downstream tasks like natural language inference and cleaning corrupted corpora. This paper outlines a promising direction for non-similarity-based information retrieval which is currently underexplored.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xu25s, title = {Contradiction Retrieval via Contrastive Learning with Sparsity}, author = {Xu, Haike and Lin, Zongyu and Chang, Kai-Wei and Sun, Yizhou and Indyk, Piotr}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {69478--69506}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25s/xu25s.pdf}, url = {https://proceedings.mlr.press/v267/xu25s.html}, abstract = {Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and cross-encoder models exhibit different limitations. To address these challenges, we introduce a novel approach: SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences. Our method utilizes a combined metric of cosine similarity and a sparsity function to efficiently identify and retrieve documents that contradict a given query. This approach dramatically enhances the speed of contradiction detection by reducing the need for exhaustive document comparisons to simple vector calculations. We conduct contradiction retrieval experiments on Arguana, MSMARCO, and HotpotQA, where our method produces an average improvement of $11.0%$ across different models. We also validate our method on downstream tasks like natural language inference and cleaning corrupted corpora. This paper outlines a promising direction for non-similarity-based information retrieval which is currently underexplored.} }
Endnote
%0 Conference Paper %T Contradiction Retrieval via Contrastive Learning with Sparsity %A Haike Xu %A Zongyu Lin %A Kai-Wei Chang %A Yizhou Sun %A Piotr Indyk %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xu25s %I PMLR %P 69478--69506 %U https://proceedings.mlr.press/v267/xu25s.html %V 267 %X Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and cross-encoder models exhibit different limitations. To address these challenges, we introduce a novel approach: SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences. Our method utilizes a combined metric of cosine similarity and a sparsity function to efficiently identify and retrieve documents that contradict a given query. This approach dramatically enhances the speed of contradiction detection by reducing the need for exhaustive document comparisons to simple vector calculations. We conduct contradiction retrieval experiments on Arguana, MSMARCO, and HotpotQA, where our method produces an average improvement of $11.0%$ across different models. We also validate our method on downstream tasks like natural language inference and cleaning corrupted corpora. This paper outlines a promising direction for non-similarity-based information retrieval which is currently underexplored.
APA
Xu, H., Lin, Z., Chang, K., Sun, Y. & Indyk, P.. (2025). Contradiction Retrieval via Contrastive Learning with Sparsity. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69478-69506 Available from https://proceedings.mlr.press/v267/xu25s.html.

Related Material