Separating Knowledge and Perception with Procedural Data

Adrian Rodriguez-Munoz, Manel Baradad, Phillip Isola, Antonio Torralba
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:51890-51904, 2025.

Abstract

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1% on NIGHTS visual similarity, outperforms by 8% and 15% on CUB200 and Flowers102 fine-grained classification, and is within 10% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an $R^2$ on COCO within 10% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-rodriguez-munoz25a, title = {Separating Knowledge and Perception with Procedural Data}, author = {Rodriguez-Munoz, Adrian and Baradad, Manel and Isola, Phillip and Torralba, Antonio}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {51890--51904}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/rodriguez-munoz25a/rodriguez-munoz25a.pdf}, url = {https://proceedings.mlr.press/v267/rodriguez-munoz25a.html}, abstract = {We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1% on NIGHTS visual similarity, outperforms by 8% and 15% on CUB200 and Flowers102 fine-grained classification, and is within 10% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an $R^2$ on COCO within 10% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.} }
Endnote
%0 Conference Paper %T Separating Knowledge and Perception with Procedural Data %A Adrian Rodriguez-Munoz %A Manel Baradad %A Phillip Isola %A Antonio Torralba %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-rodriguez-munoz25a %I PMLR %P 51890--51904 %U https://proceedings.mlr.press/v267/rodriguez-munoz25a.html %V 267 %X We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1% on NIGHTS visual similarity, outperforms by 8% and 15% on CUB200 and Flowers102 fine-grained classification, and is within 10% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an $R^2$ on COCO within 10% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.
APA
Rodriguez-Munoz, A., Baradad, M., Isola, P. & Torralba, A.. (2025). Separating Knowledge and Perception with Procedural Data. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:51890-51904 Available from https://proceedings.mlr.press/v267/rodriguez-munoz25a.html.

Related Material