A Model of Place Field Reorganization During Reward Maximization

M Ganesh Kumar, Blake Bordelon, Jacob A Zavatone-Veth, Cengiz Pehlevan
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:31892-31929, 2025.

Abstract

When rodents learn to navigate in a novel environment, a high density of place fields emerges at reward locations, fields elongate against the trajectory, and individual fields change spatial selectivity while demonstrating stable behavior. Why place fields demonstrate these characteristic phenomena during learning remains elusive. We develop a normative framework using a reward maximization objective, whereby the temporal difference (TD) error drives place field reorganization to improve policy learning. Place fields are modeled using Gaussian radial basis functions to represent states in an environment, and directly synapse to an actor-critic for policy learning. Each field’s amplitude, center, and width, as well as downstream weights, are updated online at each time step to maximize rewards. We demonstrate that this framework unifies three disparate phenomena observed in navigation experiments. Furthermore, we show that these place field phenomena improve policy convergence when learning to navigate to a single target and relearning multiple new targets. To conclude, we develop a simple normative model that recapitulates several aspects of hippocampal place field learning dynamics and unifies mechanisms to offer testable predictions for future experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kumar25a, title = {A Model of Place Field Reorganization During Reward Maximization}, author = {Kumar, M Ganesh and Bordelon, Blake and Zavatone-Veth, Jacob A and Pehlevan, Cengiz}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {31892--31929}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kumar25a/kumar25a.pdf}, url = {https://proceedings.mlr.press/v267/kumar25a.html}, abstract = {When rodents learn to navigate in a novel environment, a high density of place fields emerges at reward locations, fields elongate against the trajectory, and individual fields change spatial selectivity while demonstrating stable behavior. Why place fields demonstrate these characteristic phenomena during learning remains elusive. We develop a normative framework using a reward maximization objective, whereby the temporal difference (TD) error drives place field reorganization to improve policy learning. Place fields are modeled using Gaussian radial basis functions to represent states in an environment, and directly synapse to an actor-critic for policy learning. Each field’s amplitude, center, and width, as well as downstream weights, are updated online at each time step to maximize rewards. We demonstrate that this framework unifies three disparate phenomena observed in navigation experiments. Furthermore, we show that these place field phenomena improve policy convergence when learning to navigate to a single target and relearning multiple new targets. To conclude, we develop a simple normative model that recapitulates several aspects of hippocampal place field learning dynamics and unifies mechanisms to offer testable predictions for future experiments.} }
Endnote
%0 Conference Paper %T A Model of Place Field Reorganization During Reward Maximization %A M Ganesh Kumar %A Blake Bordelon %A Jacob A Zavatone-Veth %A Cengiz Pehlevan %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kumar25a %I PMLR %P 31892--31929 %U https://proceedings.mlr.press/v267/kumar25a.html %V 267 %X When rodents learn to navigate in a novel environment, a high density of place fields emerges at reward locations, fields elongate against the trajectory, and individual fields change spatial selectivity while demonstrating stable behavior. Why place fields demonstrate these characteristic phenomena during learning remains elusive. We develop a normative framework using a reward maximization objective, whereby the temporal difference (TD) error drives place field reorganization to improve policy learning. Place fields are modeled using Gaussian radial basis functions to represent states in an environment, and directly synapse to an actor-critic for policy learning. Each field’s amplitude, center, and width, as well as downstream weights, are updated online at each time step to maximize rewards. We demonstrate that this framework unifies three disparate phenomena observed in navigation experiments. Furthermore, we show that these place field phenomena improve policy convergence when learning to navigate to a single target and relearning multiple new targets. To conclude, we develop a simple normative model that recapitulates several aspects of hippocampal place field learning dynamics and unifies mechanisms to offer testable predictions for future experiments.
APA
Kumar, M.G., Bordelon, B., Zavatone-Veth, J.A. & Pehlevan, C.. (2025). A Model of Place Field Reorganization During Reward Maximization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:31892-31929 Available from https://proceedings.mlr.press/v267/kumar25a.html.

Related Material