Learning Fused State Representations for Control from Multi-View Observations

Zeyu Wang, Yao-Hui Li, Xin Li, Hongyu Zang, Romain Laroche, Riashat Islam
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:63365-63386, 2025.

Abstract

Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC’s robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance. The project code is available at https://github.com/zpwdev/MFSC.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-wang25ay, title = {Learning Fused State Representations for Control from Multi-View Observations}, author = {Wang, Zeyu and Li, Yao-Hui and Li, Xin and Zang, Hongyu and Laroche, Romain and Islam, Riashat}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {63365--63386}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/wang25ay/wang25ay.pdf}, url = {https://proceedings.mlr.press/v267/wang25ay.html}, abstract = {Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC’s robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance. The project code is available at https://github.com/zpwdev/MFSC.} }
Endnote
%0 Conference Paper %T Learning Fused State Representations for Control from Multi-View Observations %A Zeyu Wang %A Yao-Hui Li %A Xin Li %A Hongyu Zang %A Romain Laroche %A Riashat Islam %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-wang25ay %I PMLR %P 63365--63386 %U https://proceedings.mlr.press/v267/wang25ay.html %V 267 %X Multi-View Reinforcement Learning (MVRL) seeks to provide agents with multi-view observations, enabling them to perceive environment with greater effectiveness and precision. Recent advancements in MVRL focus on extracting latent representations from multiview observations and leveraging them in control tasks. However, it is not straightforward to learn compact and task-relevant representations, particularly in the presence of redundancy, distracting information, or missing views. In this paper, we propose Multi-view Fusion State for Control (MFSC), firstly incorporating bisimulation metric learning into MVRL to learn task-relevant representations. Furthermore, we propose a multiview-based mask and latent reconstruction auxiliary task that exploits shared information across views and improves MFSC’s robustness in missing views by introducing a mask token. Extensive experimental results demonstrate that our method outperforms existing approaches in MVRL tasks. Even in more realistic scenarios with interference or missing views, MFSC consistently maintains high performance. The project code is available at https://github.com/zpwdev/MFSC.
APA
Wang, Z., Li, Y., Li, X., Zang, H., Laroche, R. & Islam, R.. (2025). Learning Fused State Representations for Control from Multi-View Observations. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:63365-63386 Available from https://proceedings.mlr.press/v267/wang25ay.html.

Related Material