Prompt-based Visual Alignment for Zero-shot Policy Transfer

Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, Qicheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:14954-14968, 2024.

Abstract

Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-gao24r, title = {Prompt-based Visual Alignment for Zero-shot Policy Transfer}, author = {Gao, Haihan and Zhang, Rui and Yi, Qi and Yao, Hantao and Li, Haochen and Guo, Jiaming and Peng, Shaohui and Gao, Yunkai and Wang, Qicheng and Hu, Xing and Wen, Yuanbo and Zhang, Zihao and Du, Zidong and Li, Ling and Guo, Qi and Chen, Yunji}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {14954--14968}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/gao24r/gao24r.pdf}, url = {https://proceedings.mlr.press/v235/gao24r.html}, abstract = {Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data.} }
Endnote
%0 Conference Paper %T Prompt-based Visual Alignment for Zero-shot Policy Transfer %A Haihan Gao %A Rui Zhang %A Qi Yi %A Hantao Yao %A Haochen Li %A Jiaming Guo %A Shaohui Peng %A Yunkai Gao %A Qicheng Wang %A Xing Hu %A Yuanbo Wen %A Zihao Zhang %A Zidong Du %A Ling Li %A Qi Guo %A Yunji Chen %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-gao24r %I PMLR %P 14954--14968 %U https://proceedings.mlr.press/v235/gao24r.html %V 235 %X Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issues, in this work, we propose prompt-based visual alignment (PVA), a robust framework to mitigate the detrimental domain bias in the image for zero-shot policy transfer. Inspired that Visual-Language Model (VLM) can serve as a bridge to connect both text space and image space, we leverage the semantic information contained in a text sequence as an explicit constraint to train a visual aligner. Thus, the visual aligner can map images from multiple domains to a unified domain and achieve good generalization performance. To better depict semantic information, prompt tuning is applied to learn a sequence of learnable tokens. With explicit constraints of semantic information, PVA can learn unified cross-domain representation under limited access to cross-domain data and achieves great zero-shot generalization ability in unseen domains. We verify PVA on a vision-based autonomous driving task with CARLA simulator. Experiments show that the agent generalizes well on unseen domains under limited access to multi-domain data.
APA
Gao, H., Zhang, R., Yi, Q., Yao, H., Li, H., Guo, J., Peng, S., Gao, Y., Wang, Q., Hu, X., Wen, Y., Zhang, Z., Du, Z., Li, L., Guo, Q. & Chen, Y.. (2024). Prompt-based Visual Alignment for Zero-shot Policy Transfer. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:14954-14968 Available from https://proceedings.mlr.press/v235/gao24r.html.

Related Material