Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Bryan Lincoln Marques De Oliveira; Luana Guedes Barros Martins; Bruno Brandão; Murilo Lopes Da Luz; Telma Woerle De Lima Soares; Luckeciano Carvalho Melo

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Bryan Lincoln Marques De Oliveira, Luana Guedes Barros Martins, Bruno Brandão, Murilo Lopes Da Luz, Telma Woerle De Lima Soares, Luckeciano Carvalho Melo

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:12689-12717, 2025.

Abstract

Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that transforms the classic 8-tile puzzle into a visual RL task with images drawn from arbitrarily large datasets. SPGym’s key innovation lies in its ability to precisely control representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics, observation, and action spaces. This design enables researchers to isolate and scale the visual representation challenge independently of other learning components. Through extensive experiments with model-free and model-based RL algorithms, we uncover fundamental limitations in current methods’ ability to handle visual diversity. As we increase the pool of possible images, all algorithms exhibit in- and out-of-distribution performance degradation, with sophisticated representation learning techniques often underperforming simpler approaches like data augmentation. These findings highlight critical gaps in visual representation learning for RL and establish SPGym as a valuable tool for driving progress in robust, generalizable decision-making systems.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-de-oliveira25a,
  title = 	 {Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning},
  author =       {De Oliveira, Bryan Lincoln Marques and Martins, Luana Guedes Barros and Brand\~{a}o, Bruno and Luz, Murilo Lopes Da and De Lima Soares, Telma Woerle and Carvalho Melo, Luckeciano},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {12689--12717},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/de-oliveira25a/de-oliveira25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/de-oliveira25a.html},
  abstract = 	 {Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that transforms the classic 8-tile puzzle into a visual RL task with images drawn from arbitrarily large datasets. SPGym’s key innovation lies in its ability to precisely control representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics, observation, and action spaces. This design enables researchers to isolate and scale the visual representation challenge independently of other learning components. Through extensive experiments with model-free and model-based RL algorithms, we uncover fundamental limitations in current methods’ ability to handle visual diversity. As we increase the pool of possible images, all algorithms exhibit in- and out-of-distribution performance degradation, with sophisticated representation learning techniques often underperforming simpler approaches like data augmentation. These findings highlight critical gaps in visual representation learning for RL and establish SPGym as a valuable tool for driving progress in robust, generalizable decision-making systems.}
}

Endnote

%0 Conference Paper
%T Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
%A Bryan Lincoln Marques De Oliveira
%A Luana Guedes Barros Martins
%A Bruno Brandão
%A Murilo Lopes Da Luz
%A Telma Woerle De Lima Soares
%A Luckeciano Carvalho Melo
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-de-oliveira25a
%I PMLR
%P 12689--12717
%U https://proceedings.mlr.press/v267/de-oliveira25a.html
%V 267
%X Effective visual representation learning is crucial for reinforcement learning (RL) agents to extract task-relevant information from raw sensory inputs and generalize across diverse environments. However, existing RL benchmarks lack the ability to systematically evaluate representation learning capabilities in isolation from other learning challenges. To address this gap, we introduce the Sliding Puzzles Gym (SPGym), a novel benchmark that transforms the classic 8-tile puzzle into a visual RL task with images drawn from arbitrarily large datasets. SPGym’s key innovation lies in its ability to precisely control representation learning complexity through adjustable grid sizes and image pools, while maintaining fixed environment dynamics, observation, and action spaces. This design enables researchers to isolate and scale the visual representation challenge independently of other learning components. Through extensive experiments with model-free and model-based RL algorithms, we uncover fundamental limitations in current methods’ ability to handle visual diversity. As we increase the pool of possible images, all algorithms exhibit in- and out-of-distribution performance degradation, with sophisticated representation learning techniques often underperforming simpler approaches like data augmentation. These findings highlight critical gaps in visual representation learning for RL and establish SPGym as a valuable tool for driving progress in robust, generalizable decision-making systems.

APA

De Oliveira, B.L.M., Martins, L.G.B., Brandão, B., Luz, M.L.D., De Lima Soares, T.W. & Carvalho Melo, L.. (2025). Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:12689-12717 Available from https://proceedings.mlr.press/v267/de-oliveira25a.html.

Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning

Abstract

Cite this Paper

Related Material