HALO : Human Preference Aligned Offline Reward Learning for Robot Navigation

Gershom Seneviratne, Jianyu An, Sahire Ellahy, Kasun Weerakoon, Mohamed Bashir Elnoor, Jonathan Deepak Kannan, Amogha Thalihalla Sunil, Dinesh Manocha
Proceedings of The 9th Conference on Robot Learning, PMLR 305:3267-3284, 2025.

Abstract

In this paper, we introduce HALO, a novel Offline Reward Learning algorithm that quantifies human intuition in navigation into a vision-based reward function for robot navigation. HALO learns a reward model from offline data, leveraging expert trajectories collected from mobile robots. During training, actions are randomly sampled from the action space around the expert action and ranked using a Boltzmann probability distribution that combines their distance to the expert action with human preference scores derived from intuitive navigation queries based on the corresponding egocentric camera feed. These scores establish preference rankings, enabling the training of a novel reward model based on Plackett-Luce loss, which allows for preference-driven navigation. To demonstrate the effectiveness of HALO, we deploy its reward model in two downstream applications: (i) an offline learned policy trained directly on the HALO-derived rewards, and (ii) a model-predictive-control (MPC) based planner that incorporates the HALO reward as an additional cost term. This showcases the versatility of HALO across both learning-based and classical navigation frameworks. Our real-world deployments on a Clearpath Husky across multiple scenarios demonstrate that policies trained with HALO achieve improved performance over state-of-the-art methods in terms of success rate and normalized trajectory length while maintaining lower Fréchet distance with the human expert trajectories.

Cite this Paper


BibTeX
@InProceedings{pmlr-v305-seneviratne25a, title = {HALO : Human Preference Aligned Offline Reward Learning for Robot Navigation}, author = {Seneviratne, Gershom and An, Jianyu and Ellahy, Sahire and Weerakoon, Kasun and Elnoor, Mohamed Bashir and Kannan, Jonathan Deepak and Sunil, Amogha Thalihalla and Manocha, Dinesh}, booktitle = {Proceedings of The 9th Conference on Robot Learning}, pages = {3267--3284}, year = {2025}, editor = {Lim, Joseph and Song, Shuran and Park, Hae-Won}, volume = {305}, series = {Proceedings of Machine Learning Research}, month = {27--30 Sep}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v305/main/assets/seneviratne25a/seneviratne25a.pdf}, url = {https://proceedings.mlr.press/v305/seneviratne25a.html}, abstract = {In this paper, we introduce HALO, a novel Offline Reward Learning algorithm that quantifies human intuition in navigation into a vision-based reward function for robot navigation. HALO learns a reward model from offline data, leveraging expert trajectories collected from mobile robots. During training, actions are randomly sampled from the action space around the expert action and ranked using a Boltzmann probability distribution that combines their distance to the expert action with human preference scores derived from intuitive navigation queries based on the corresponding egocentric camera feed. These scores establish preference rankings, enabling the training of a novel reward model based on Plackett-Luce loss, which allows for preference-driven navigation. To demonstrate the effectiveness of HALO, we deploy its reward model in two downstream applications: (i) an offline learned policy trained directly on the HALO-derived rewards, and (ii) a model-predictive-control (MPC) based planner that incorporates the HALO reward as an additional cost term. This showcases the versatility of HALO across both learning-based and classical navigation frameworks. Our real-world deployments on a Clearpath Husky across multiple scenarios demonstrate that policies trained with HALO achieve improved performance over state-of-the-art methods in terms of success rate and normalized trajectory length while maintaining lower Fréchet distance with the human expert trajectories.} }
Endnote
%0 Conference Paper %T HALO : Human Preference Aligned Offline Reward Learning for Robot Navigation %A Gershom Seneviratne %A Jianyu An %A Sahire Ellahy %A Kasun Weerakoon %A Mohamed Bashir Elnoor %A Jonathan Deepak Kannan %A Amogha Thalihalla Sunil %A Dinesh Manocha %B Proceedings of The 9th Conference on Robot Learning %C Proceedings of Machine Learning Research %D 2025 %E Joseph Lim %E Shuran Song %E Hae-Won Park %F pmlr-v305-seneviratne25a %I PMLR %P 3267--3284 %U https://proceedings.mlr.press/v305/seneviratne25a.html %V 305 %X In this paper, we introduce HALO, a novel Offline Reward Learning algorithm that quantifies human intuition in navigation into a vision-based reward function for robot navigation. HALO learns a reward model from offline data, leveraging expert trajectories collected from mobile robots. During training, actions are randomly sampled from the action space around the expert action and ranked using a Boltzmann probability distribution that combines their distance to the expert action with human preference scores derived from intuitive navigation queries based on the corresponding egocentric camera feed. These scores establish preference rankings, enabling the training of a novel reward model based on Plackett-Luce loss, which allows for preference-driven navigation. To demonstrate the effectiveness of HALO, we deploy its reward model in two downstream applications: (i) an offline learned policy trained directly on the HALO-derived rewards, and (ii) a model-predictive-control (MPC) based planner that incorporates the HALO reward as an additional cost term. This showcases the versatility of HALO across both learning-based and classical navigation frameworks. Our real-world deployments on a Clearpath Husky across multiple scenarios demonstrate that policies trained with HALO achieve improved performance over state-of-the-art methods in terms of success rate and normalized trajectory length while maintaining lower Fréchet distance with the human expert trajectories.
APA
Seneviratne, G., An, J., Ellahy, S., Weerakoon, K., Elnoor, M.B., Kannan, J.D., Sunil, A.T. & Manocha, D.. (2025). HALO : Human Preference Aligned Offline Reward Learning for Robot Navigation. Proceedings of The 9th Conference on Robot Learning, in Proceedings of Machine Learning Research 305:3267-3284 Available from https://proceedings.mlr.press/v305/seneviratne25a.html.

Related Material