LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

Yiding Lu; Mouxing Yang; Dezhong Peng; Peng Hu; Yijie Lin; Xi Peng

LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

Yiding Lu, Mouxing Yang, Dezhong Peng, Peng Hu, Yijie Lin, Xi Peng

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:40868-40887, 2025.

Abstract

Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-lu25s,
  title = 	 {{LL}a{VA}-{R}e{ID}: Selective Multi-image Questioner for Interactive Person Re-Identification},
  author =       {Lu, Yiding and Yang, Mouxing and Peng, Dezhong and Hu, Peng and Lin, Yijie and Peng, Xi},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {40868--40887},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lu25s/lu25s.pdf},
  url = 	 {https://proceedings.mlr.press/v267/lu25s.html},
  abstract = 	 {Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.}
}

Endnote

%0 Conference Paper
%T LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
%A Yiding Lu
%A Mouxing Yang
%A Dezhong Peng
%A Peng Hu
%A Yijie Lin
%A Xi Peng
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-lu25s
%I PMLR
%P 40868--40887
%U https://proceedings.mlr.press/v267/lu25s.html
%V 267
%X Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.

APA

Lu, Y., Yang, M., Peng, D., Hu, P., Lin, Y. & Peng, X.. (2025). LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:40868-40887 Available from https://proceedings.mlr.press/v267/lu25s.html.

LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

Abstract

Cite this Paper

Related Material