Simplified priors for Object-Centric Learning

Vihang Prakash Patil, Andreas Radler, Daniel Klotz, Sepp Hochreiter
Proceedings of The 3rd Conference on Lifelong Learning Agents, PMLR 274:29-48, 2025.

Abstract

Humans excel at abstracting data and constructing *reusable* concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots, from data without human supervision. Different methods have been proposed to tackle this task for images, whereas most are overly complex, non-differentiable, or poorly scalable. In this paper, we introduce a conceptually simple, fully-differentiable, non-iterative, and scalable method called **SAMP** (**S**implified Slot **A**ttention with **M**ax Pool **P**riors). It is implementable using only Convolution and MaxPool layers and an Attention layer. Our method encodes the input image with a Convolutional Neural Network and then uses a branch of alternating Convolution and MaxPool layers to create specialized sub-networks and extract primitive slots. These primitive slots are then used as queries for a Simplified Slot Attention over the encoded image. Despite its simplicity, our method is competitive or outperforms previous methods on standard benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v274-patil25a, title = {Simplified priors for Object-Centric Learning}, author = {Patil, Vihang Prakash and Radler, Andreas and Klotz, Daniel and Hochreiter, Sepp}, booktitle = {Proceedings of The 3rd Conference on Lifelong Learning Agents}, pages = {29--48}, year = {2025}, editor = {Lomonaco, Vincenzo and Melacci, Stefano and Tuytelaars, Tinne and Chandar, Sarath and Pascanu, Razvan}, volume = {274}, series = {Proceedings of Machine Learning Research}, month = {29 Jul--01 Aug}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v274/main/assets/patil25a/patil25a.pdf}, url = {https://proceedings.mlr.press/v274/patil25a.html}, abstract = {Humans excel at abstracting data and constructing *reusable* concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots, from data without human supervision. Different methods have been proposed to tackle this task for images, whereas most are overly complex, non-differentiable, or poorly scalable. In this paper, we introduce a conceptually simple, fully-differentiable, non-iterative, and scalable method called **SAMP** (**S**implified Slot **A**ttention with **M**ax Pool **P**riors). It is implementable using only Convolution and MaxPool layers and an Attention layer. Our method encodes the input image with a Convolutional Neural Network and then uses a branch of alternating Convolution and MaxPool layers to create specialized sub-networks and extract primitive slots. These primitive slots are then used as queries for a Simplified Slot Attention over the encoded image. Despite its simplicity, our method is competitive or outperforms previous methods on standard benchmarks.} }
Endnote
%0 Conference Paper %T Simplified priors for Object-Centric Learning %A Vihang Prakash Patil %A Andreas Radler %A Daniel Klotz %A Sepp Hochreiter %B Proceedings of The 3rd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2025 %E Vincenzo Lomonaco %E Stefano Melacci %E Tinne Tuytelaars %E Sarath Chandar %E Razvan Pascanu %F pmlr-v274-patil25a %I PMLR %P 29--48 %U https://proceedings.mlr.press/v274/patil25a.html %V 274 %X Humans excel at abstracting data and constructing *reusable* concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots, from data without human supervision. Different methods have been proposed to tackle this task for images, whereas most are overly complex, non-differentiable, or poorly scalable. In this paper, we introduce a conceptually simple, fully-differentiable, non-iterative, and scalable method called **SAMP** (**S**implified Slot **A**ttention with **M**ax Pool **P**riors). It is implementable using only Convolution and MaxPool layers and an Attention layer. Our method encodes the input image with a Convolutional Neural Network and then uses a branch of alternating Convolution and MaxPool layers to create specialized sub-networks and extract primitive slots. These primitive slots are then used as queries for a Simplified Slot Attention over the encoded image. Despite its simplicity, our method is competitive or outperforms previous methods on standard benchmarks.
APA
Patil, V.P., Radler, A., Klotz, D. & Hochreiter, S.. (2025). Simplified priors for Object-Centric Learning. Proceedings of The 3rd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 274:29-48 Available from https://proceedings.mlr.press/v274/patil25a.html.

Related Material