AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors

Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:51546-51568, 2024.

Abstract

Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space. Our method achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. The indispensable role of implicit actions learned by IAG is also empirically validated.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-wang24bq, title = {{AD}3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors}, author = {Wang, Yucen and Wan, Shenghua and Gan, Le and Feng, Shuai and Zhan, De-Chuan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {51546--51568}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/wang24bq/wang24bq.pdf}, url = {https://proceedings.mlr.press/v235/wang24bq.html}, abstract = {Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space. Our method achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. The indispensable role of implicit actions learned by IAG is also empirically validated.} }
Endnote
%0 Conference Paper %T AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors %A Yucen Wang %A Shenghua Wan %A Le Gan %A Shuai Feng %A De-Chuan Zhan %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-wang24bq %I PMLR %P 51546--51568 %U https://proceedings.mlr.press/v235/wang24bq.html %V 235 %X Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space. Our method achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. The indispensable role of implicit actions learned by IAG is also empirically validated.
APA
Wang, Y., Wan, S., Gan, L., Feng, S. & Zhan, D.. (2024). AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:51546-51568 Available from https://proceedings.mlr.press/v235/wang24bq.html.

Related Material