RIS: Region-to-Image Search using ViT-like Embeddings

Oussama Zayene; Lucas Genoud; Jean Hennebert; Houda Chabbi Drissi; Benoit de Raemy

RIS: Region-to-Image Search using ViT-like Embeddings

Oussama Zayene, Lucas Genoud, Jean Hennebert, Houda Chabbi Drissi, Benoit de Raemy

Proceedings of the Fourth Swiss AI Days, PMLR 309:56-66, 2026.

Abstract

We propose RIS (Region-to-Image Search), a two-stage framework for localized visual retrieval. RIS performs structural re-ranking directly within the latent embedding space of Vision Transformers, such as SigLIP2 and I-JEPA, bypassing traditional pixel-level verification. By matching a query Region of Interest (ROI) through a spatially-consistent region-growing algorithm, the framework ensures geometric coherence across latent representations. Preliminary qualitative results demonstrate that this embedding-based re-ranking improves Top-5 retrieval accuracy by at least 10% over standalone global methods, providing a robust and efficient mechanism for localized forensic search.

Cite this Paper

BibTeX

@InProceedings{pmlr-v309-zayene26a,
  title = 	 {RIS: Region-to-Image Search using ViT-like Embeddings},
  author =       {Zayene, Oussama and Genoud, Lucas and Hennebert, Jean and Drissi, Houda Chabbi and de Raemy, Benoit},
  booktitle = 	 {Proceedings of the Fourth Swiss AI Days},
  pages = 	 {56--66},
  year = 	 {2026},
  editor = 	 {Kucharavy, Andrei and Delgado, Pamela and Schürch Todeschini, Valérie and Rumley, Sébastien},
  volume = 	 {309},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--25 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v309/main/assets/zayene26a/zayene26a.pdf},
  url = 	 {https://proceedings.mlr.press/v309/zayene26a.html},
  abstract = 	 {We propose RIS (Region-to-Image Search), a two-stage framework for localized visual retrieval. RIS performs structural re-ranking directly within the latent embedding space of Vision Transformers, such as SigLIP2 and I-JEPA, bypassing traditional pixel-level verification. By matching a query Region of Interest (ROI) through a spatially-consistent region-growing algorithm, the framework ensures geometric coherence across latent representations. Preliminary qualitative results demonstrate that this embedding-based re-ranking improves Top-5 retrieval accuracy by at least 10% over standalone global methods, providing a robust and efficient mechanism for localized forensic search.}
}

Endnote

%0 Conference Paper
%T RIS: Region-to-Image Search using ViT-like Embeddings
%A Oussama Zayene
%A Lucas Genoud
%A Jean Hennebert
%A Houda Chabbi Drissi
%A Benoit de Raemy
%B Proceedings of the Fourth Swiss AI Days
%C Proceedings of Machine Learning Research
%D 2026
%E Andrei Kucharavy
%E Pamela Delgado
%E Valérie Schürch Todeschini
%E Sébastien Rumley	
%F pmlr-v309-zayene26a
%I PMLR
%P 56--66
%U https://proceedings.mlr.press/v309/zayene26a.html
%V 309
%X We propose RIS (Region-to-Image Search), a two-stage framework for localized visual retrieval. RIS performs structural re-ranking directly within the latent embedding space of Vision Transformers, such as SigLIP2 and I-JEPA, bypassing traditional pixel-level verification. By matching a query Region of Interest (ROI) through a spatially-consistent region-growing algorithm, the framework ensures geometric coherence across latent representations. Preliminary qualitative results demonstrate that this embedding-based re-ranking improves Top-5 retrieval accuracy by at least 10% over standalone global methods, providing a robust and efficient mechanism for localized forensic search.

APA

Zayene, O., Genoud, L., Hennebert, J., Drissi, H.C. & de Raemy, B.. (2026). RIS: Region-to-Image Search using ViT-like Embeddings. Proceedings of the Fourth Swiss AI Days, in Proceedings of Machine Learning Research 309:56-66 Available from https://proceedings.mlr.press/v309/zayene26a.html.

Related Material

Download PDF