Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

Cevahir Koprulu, Po-Han Li, Tianyu Qiu, Ruihan Zhao, Tyler Westenbroek, David Fridovich-Keil, Sandeep Chinchali, Ufuk Topcu
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:894-906, 2025.

Abstract

Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-koprulu25a, title = {Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations}, author = {Koprulu, Cevahir and Li, Po-Han and Qiu, Tianyu and Zhao, Ruihan and Westenbroek, Tyler and Fridovich-Keil, David and Chinchali, Sandeep and Topcu, Ufuk}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {894--906}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/koprulu25a/koprulu25a.pdf}, url = {https://proceedings.mlr.press/v283/koprulu25a.html}, abstract = {Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.} }
Endnote
%0 Conference Paper %T Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations %A Cevahir Koprulu %A Po-Han Li %A Tianyu Qiu %A Ruihan Zhao %A Tyler Westenbroek %A David Fridovich-Keil %A Sandeep Chinchali %A Ufuk Topcu %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-koprulu25a %I PMLR %P 894--906 %U https://proceedings.mlr.press/v283/koprulu25a.html %V 283 %X Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.
APA
Koprulu, C., Li, P., Qiu, T., Zhao, R., Westenbroek, T., Fridovich-Keil, D., Chinchali, S. & Topcu, U.. (2025). Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:894-906 Available from https://proceedings.mlr.press/v283/koprulu25a.html.

Related Material