Leveraging Mutual Information for Asymmetric Learning under Partial Observability

Hai Huu Nguyen; Long Dinh Van The; Christopher Amato; Robert Platt

Leveraging Mutual Information for Asymmetric Learning under Partial Observability

Hai Huu Nguyen, Long Dinh Van The, Christopher Amato, Robert Platt

Proceedings of The 8th Conference on Robot Learning, PMLR 270:4546-4572, 2025.

Abstract

Even though partial observability is prevalent in robotics, most reinforcement learning studies avoid it due to the difficulty of learning a policy that can efficiently memorize past events and seek information. Fortunately, in many cases, learning can be done in an asymmetric setting where states are available during training but not during execution. Prior studies often leverage the state to indirectly influence the training of a history-based actor (actor-critic methods) or a history-based critic (value-based methods). Instead, we propose using state-observation and state-history mutual information to improve the agent’s architecture and ability to seek information and memorize efficiently through intrinsic rewards and an auxiliary task. Our method outperforms strong baselines through extensive experiments and achieves successful sim-to-real transfers to a real robot.

Cite this Paper

BibTeX

@InProceedings{pmlr-v270-nguyen25b,
  title = 	 {Leveraging Mutual Information for Asymmetric Learning under Partial Observability},
  author =       {Nguyen, Hai Huu and The, Long Dinh Van and Amato, Christopher and Platt, Robert},
  booktitle = 	 {Proceedings of The 8th Conference on Robot Learning},
  pages = 	 {4546--4572},
  year = 	 {2025},
  editor = 	 {Agrawal, Pulkit and Kroemer, Oliver and Burgard, Wolfram},
  volume = 	 {270},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--09 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v270/main/assets/nguyen25b/nguyen25b.pdf},
  url = 	 {https://proceedings.mlr.press/v270/nguyen25b.html},
  abstract = 	 {Even though partial observability is prevalent in robotics, most reinforcement learning studies avoid it due to the difficulty of learning a policy that can efficiently memorize past events and seek information. Fortunately, in many cases, learning can be done in an asymmetric setting where states are available during training but not during execution. Prior studies often leverage the state to indirectly influence the training of a history-based actor (actor-critic methods) or a history-based critic (value-based methods). Instead, we propose using state-observation and state-history mutual information to improve the agent’s architecture and ability to seek information and memorize efficiently through intrinsic rewards and an auxiliary task. Our method outperforms strong baselines through extensive experiments and achieves successful sim-to-real transfers to a real robot.}
}

Endnote

%0 Conference Paper
%T Leveraging Mutual Information for Asymmetric Learning under Partial Observability
%A Hai Huu Nguyen
%A Long Dinh Van The
%A Christopher Amato
%A Robert Platt
%B Proceedings of The 8th Conference on Robot Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Pulkit Agrawal
%E Oliver Kroemer
%E Wolfram Burgard	
%F pmlr-v270-nguyen25b
%I PMLR
%P 4546--4572
%U https://proceedings.mlr.press/v270/nguyen25b.html
%V 270
%X Even though partial observability is prevalent in robotics, most reinforcement learning studies avoid it due to the difficulty of learning a policy that can efficiently memorize past events and seek information. Fortunately, in many cases, learning can be done in an asymmetric setting where states are available during training but not during execution. Prior studies often leverage the state to indirectly influence the training of a history-based actor (actor-critic methods) or a history-based critic (value-based methods). Instead, we propose using state-observation and state-history mutual information to improve the agent’s architecture and ability to seek information and memorize efficiently through intrinsic rewards and an auxiliary task. Our method outperforms strong baselines through extensive experiments and achieves successful sim-to-real transfers to a real robot.

APA

Nguyen, H.H., The, L.D.V., Amato, C. & Platt, R.. (2025). Leveraging Mutual Information for Asymmetric Learning under Partial Observability. Proceedings of The 8th Conference on Robot Learning, in Proceedings of Machine Learning Research 270:4546-4572 Available from https://proceedings.mlr.press/v270/nguyen25b.html.

Leveraging Mutual Information for Asymmetric Learning under Partial Observability

Abstract

Cite this Paper

Related Material