[edit]
ASAP: Attention-Based State Space Abstraction for Policy Summarization
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:137-152, 2024.
Abstract
Deep reinforcement learning (RL) has shown remarkable performance, but end-users do not understand how the system solves tasks due to the black-box nature of neural networks. Many methods from explainable machine learning have been adapted to RL. However, they do not focus on the unique challenges of explaining actions’ short-term and long-term consequences. This work introduces a new perspective to understanding RL policies by clustering states into abstract states utilizing attention maps, giving a bird’s-eye view of the policy’s behavior. We learn the attention maps iteratively together with the clustering of states by masking the input features to estimate their importance. In contrast to previous works that have uninterpretable abstract states and/or clustering objectives using state values that are non-human intuitive, we only leverage attention maps in the clustering. The policy only indirectly affects the clustering via attention maps. This allows us to give global explanations from the view of feature attention, a quantity a human can relate to given interpretable features. The experiments demonstrate that our method provides faithful abstractions by capturing state semantics, policy behavior, and feature attention. Furthermore, we show that our attention maps can mask state features without affecting policy performance.