The Geometry of Nonlinear Reinforcement Learning

Nikola Milosevic; Nico Scherf

The Geometry of Nonlinear Reinforcement Learning

Nikola Milosevic, Nico Scherf

Proceedings of the Geometry, Topology, and Machine Learning Workshop, PMLR 325:215-239, 2026.

Abstract

Reward maximization, safe exploration, and intrinsic motivation are often studied as separate objectives in reinforcement learning (RL). We present a unified geometric framework, that views these goals as instances of a single optimization problem on the space of achievable long-term behavior in an environment. Within this framework, classical methods such as policy mirror descent, natural policy gradient, and trust-region algorithms naturally generalize to nonlinear utilities and convex constraints. We illustrate how this perspective captures robustness, safety, exploration, and diversity objectives, and outline open challenges at the interface of geometry and deep RL.

Cite this Paper

BibTeX

@InProceedings{pmlr-v325-milosevic26a,
  title = 	 {The Geometry of Nonlinear Reinforcement Learning},
  author =       {Milosevic, Nikola and Scherf, Nico},
  booktitle = 	 {Proceedings of the Geometry, Topology, and Machine Learning Workshop},
  pages = 	 {215--239},
  year = 	 {2026},
  editor = 	 {Bleher, Michael and Jensen, Freya and Maier, Levin and Taha, Diaaeldin and Wienhard, Anna},
  volume = 	 {325},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10--14 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v325/main/assets/milosevic26a/milosevic26a.pdf},
  url = 	 {https://proceedings.mlr.press/v325/milosevic26a.html},
  abstract = 	 {Reward maximization, safe exploration, and intrinsic motivation are often studied as separate objectives in reinforcement learning (RL). We present a unified geometric framework, that views these goals as instances of a single optimization problem on the space of achievable long-term behavior in an environment. Within this framework, classical methods such as policy mirror descent, natural policy gradient, and trust-region algorithms naturally generalize to nonlinear utilities and convex constraints. We illustrate how this perspective captures robustness, safety, exploration, and diversity objectives, and outline open challenges at the interface of geometry and deep RL.}
}

Endnote

%0 Conference Paper
%T The Geometry of Nonlinear Reinforcement Learning
%A Nikola Milosevic
%A Nico Scherf
%B Proceedings of the Geometry, Topology, and Machine Learning Workshop
%C Proceedings of Machine Learning Research
%D 2026
%E Michael Bleher
%E Freya Jensen
%E Levin Maier
%E Diaaeldin Taha
%E Anna Wienhard	
%F pmlr-v325-milosevic26a
%I PMLR
%P 215--239
%U https://proceedings.mlr.press/v325/milosevic26a.html
%V 325
%X Reward maximization, safe exploration, and intrinsic motivation are often studied as separate objectives in reinforcement learning (RL). We present a unified geometric framework, that views these goals as instances of a single optimization problem on the space of achievable long-term behavior in an environment. Within this framework, classical methods such as policy mirror descent, natural policy gradient, and trust-region algorithms naturally generalize to nonlinear utilities and convex constraints. We illustrate how this perspective captures robustness, safety, exploration, and diversity objectives, and outline open challenges at the interface of geometry and deep RL.

APA

Milosevic, N. & Scherf, N.. (2026). The Geometry of Nonlinear Reinforcement Learning. Proceedings of the Geometry, Topology, and Machine Learning Workshop, in Proceedings of Machine Learning Research 325:215-239 Available from https://proceedings.mlr.press/v325/milosevic26a.html.

Related Material

Download PDF