QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal

Jumman Hossain, Nirmalya Roy
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:23828-23845, 2025.

Abstract

Reinforcement learning (RL) in real-world tasks such as robotic navigation often encounters environments with asymmetric traversal costs, where actions like climbing uphill versus moving downhill incur distinctly different penalties, or transitions may become irreversible. While recent quasimetric RL methods relax symmetry assumptions, they typically do not explicitly account for path-dependent costs or provide rigorous safety guarantees. We introduce Quasi-Potential Reinforcement Learning (QPRL), a novel framework that explicitly decomposes asymmetric traversal costs into a path-independent potential function ($\Phi$) and a path-dependent residual ($\Psi$). This decomposition allows efficient learning and stable policy optimization via a Lyapunov-based safety mechanism. Theoretically, we prove that QPRL achieves convergence with improved sample complexity of $\tilde{O}(\sqrt{T})$, surpassing prior quasimetric RL bounds of $\tilde{O}(T)$. Empirically, our experiments demonstrate that QPRL attains state-of-the-art performance across various navigation and control tasks, significantly reducing irreversible constraint violations by approximately $4\times$ compared to baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-hossain25a, title = {{QPRL} : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal}, author = {Hossain, Jumman and Roy, Nirmalya}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {23828--23845}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/hossain25a/hossain25a.pdf}, url = {https://proceedings.mlr.press/v267/hossain25a.html}, abstract = {Reinforcement learning (RL) in real-world tasks such as robotic navigation often encounters environments with asymmetric traversal costs, where actions like climbing uphill versus moving downhill incur distinctly different penalties, or transitions may become irreversible. While recent quasimetric RL methods relax symmetry assumptions, they typically do not explicitly account for path-dependent costs or provide rigorous safety guarantees. We introduce Quasi-Potential Reinforcement Learning (QPRL), a novel framework that explicitly decomposes asymmetric traversal costs into a path-independent potential function ($\Phi$) and a path-dependent residual ($\Psi$). This decomposition allows efficient learning and stable policy optimization via a Lyapunov-based safety mechanism. Theoretically, we prove that QPRL achieves convergence with improved sample complexity of $\tilde{O}(\sqrt{T})$, surpassing prior quasimetric RL bounds of $\tilde{O}(T)$. Empirically, our experiments demonstrate that QPRL attains state-of-the-art performance across various navigation and control tasks, significantly reducing irreversible constraint violations by approximately $4\times$ compared to baselines.} }
Endnote
%0 Conference Paper %T QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal %A Jumman Hossain %A Nirmalya Roy %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-hossain25a %I PMLR %P 23828--23845 %U https://proceedings.mlr.press/v267/hossain25a.html %V 267 %X Reinforcement learning (RL) in real-world tasks such as robotic navigation often encounters environments with asymmetric traversal costs, where actions like climbing uphill versus moving downhill incur distinctly different penalties, or transitions may become irreversible. While recent quasimetric RL methods relax symmetry assumptions, they typically do not explicitly account for path-dependent costs or provide rigorous safety guarantees. We introduce Quasi-Potential Reinforcement Learning (QPRL), a novel framework that explicitly decomposes asymmetric traversal costs into a path-independent potential function ($\Phi$) and a path-dependent residual ($\Psi$). This decomposition allows efficient learning and stable policy optimization via a Lyapunov-based safety mechanism. Theoretically, we prove that QPRL achieves convergence with improved sample complexity of $\tilde{O}(\sqrt{T})$, surpassing prior quasimetric RL bounds of $\tilde{O}(T)$. Empirically, our experiments demonstrate that QPRL attains state-of-the-art performance across various navigation and control tasks, significantly reducing irreversible constraint violations by approximately $4\times$ compared to baselines.
APA
Hossain, J. & Roy, N.. (2025). QPRL : Learning Optimal Policies with Quasi-Potential Functions for Asymmetric Traversal. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:23828-23845 Available from https://proceedings.mlr.press/v267/hossain25a.html.

Related Material