RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Fatemeh Zargarbashi; Jin Cheng; Dongho Kang; Robert Sumner; Stelian Coros

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Fatemeh Zargarbashi, Jin Cheng, Dongho Kang, Robert Sumner, Stelian Coros

Proceedings of The 8th Conference on Robot Learning, PMLR 270:916-932, 2025.

Abstract

This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-zargarbashi25a,
  title = 	 {RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards},
  author =       {Zargarbashi, Fatemeh and Cheng, Jin and Kang, Dongho and Sumner, Robert and Coros, Stelian},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {916--932},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/zargarbashi25a/zargarbashi25a.pdf},
  url = 	 {https://proceedings.mlr.press/v270/zargarbashi25a.html},
  abstract = 	 {This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.}
}

Endnote

%0 Conference Paper
%T RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards
%A Fatemeh Zargarbashi
%A Jin Cheng
%A Dongho Kang
%A Robert Sumner
%A Stelian Coros
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-zargarbashi25a
%I PMLR
%P 916--932
%U https://proceedings.mlr.press/v270/zargarbashi25a.html
%V 270
%X This paper presents a novel learning-based control framework that uses keyframing to incorporate high-level objectives in natural locomotion for legged robots. These high-level objectives are specified as a variable number of partial or complete pose targets that are spaced arbitrarily in time. Our proposed framework utilizes a multi-critic reinforcement learning algorithm to effectively handle the mixture of dense and sparse rewards. Additionally, it employs a transformer-based encoder to accommodate a variable number of input targets, each associated with specific time-to-arrivals. Throughout simulation and hardware experiments, we demonstrate that our framework can effectively satisfy the target keyframe sequence at the required times. In the experiments, the multi-critic method significantly reduces the effort of hyperparameter tuning compared to the standard single-critic alternative. Moreover, the proposed transformer-based architecture enables robots to anticipate future goals, which results in quantitative improvements in their ability to reach their targets.

APA

Zargarbashi, F., Cheng, J., Kang, D., Sumner, R. & Coros, S.. (2025). RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:916-932 Available from https://proceedings.mlr.press/v270/zargarbashi25a.html.

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards

Abstract

Cite this Paper

Related Material