LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification

Yiding Lu, Mouxing Yang, Dezhong Peng, Peng Hu, Yijie Lin, Xi Peng
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:40868-40887, 2025.

Abstract

Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lu25s, title = {{LL}a{VA}-{R}e{ID}: Selective Multi-image Questioner for Interactive Person Re-Identification}, author = {Lu, Yiding and Yang, Mouxing and Peng, Dezhong and Hu, Peng and Lin, Yijie and Peng, Xi}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {40868--40887}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lu25s/lu25s.pdf}, url = {https://proceedings.mlr.press/v267/lu25s.html}, abstract = {Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.} }
Endnote
%0 Conference Paper %T LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification %A Yiding Lu %A Mouxing Yang %A Dezhong Peng %A Peng Hu %A Yijie Lin %A Xi Peng %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lu25s %I PMLR %P 40868--40887 %U https://proceedings.mlr.press/v267/lu25s.html %V 267 %X Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.
APA
Lu, Y., Yang, M., Peng, D., Hu, P., Lin, Y. & Peng, X.. (2025). LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:40868-40887 Available from https://proceedings.mlr.press/v267/lu25s.html.

Related Material