Outcome-Based Semifactuals for Reinforcement Learning

Yanzhe Bekkemoen; Helge Langseth

Outcome-Based Semifactuals for Reinforcement Learning

Yanzhe Bekkemoen, Helge Langseth

Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:574-589, 2025.

Abstract

Counterfactual explanations in reinforcement learning (RL) aim to answer what-if questions by demonstrating sparse and minimal changes to states, resulting in the probability mass moving from one action to another. Although these explanations are effective in classification tasks that look for the presence of concepts, RL brings new challenges that counterfactual methods need to solve. These challenges include defining state similarity, avoiding out-of-distribution states, and improving discriminative power of explanations. Given a state of interest called the query state, we solve these problems by asking how long the agent can execute the query state action without incurring a negative outcome regarding the expected return. We coin this outcome-based semifactual (OSF) explanation and find the OSF state by simulating trajectories from the query state. The last state in a subtrajectory where we can take the same action as in the query state without incurring a negative outcome is the OSF state. This state is discriminative, plausible, and similar to the query state. It abstracts away unimportant action switching with little explanatory value and shows the boundary between positive and negative outcomes. Qualitatively, we show that our method explains when an agent must switch actions. As a result, it is easier to understand the agent’s behavior. Quantitatively, we demonstrate that our method can increase policy performance and, at the same time, reduce how often the agent switches its action across six environments. The code and trained models are made open source.

Cite this Paper

BibTeX

@InProceedings{pmlr-v304-bekkemoen25a,
  title = 	 {Outcome-Based Semifactuals for Reinforcement Learning},
  author =       {Bekkemoen, Yanzhe and Langseth, Helge},
  booktitle = 	 {Proceedings of the 17th Asian Conference on Machine Learning},
  pages = 	 {574--589},
  year = 	 {2025},
  editor = 	 {Lee, Hung-yi and Liu, Tongliang},
  volume = 	 {304},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v304/main/assets/bekkemoen25a/bekkemoen25a.pdf},
  url = 	 {https://proceedings.mlr.press/v304/bekkemoen25a.html},
  abstract = 	 {Counterfactual explanations in reinforcement learning (RL) aim to answer what-if questions by demonstrating sparse and minimal changes to states, resulting in the probability mass moving from one action to another. Although these explanations are effective in classification tasks that look for the presence of concepts, RL brings new challenges that counterfactual methods need to solve. These challenges include defining state similarity, avoiding out-of-distribution states, and improving discriminative power of explanations. Given a state of interest called the query state, we solve these problems by asking how long the agent can execute the query state action without incurring a negative outcome regarding the expected return. We coin this outcome-based semifactual (OSF) explanation and find the OSF state by simulating trajectories from the query state. The last state in a subtrajectory where we can take the same action as in the query state without incurring a negative outcome is the OSF state. This state is discriminative, plausible, and similar to the query state. It abstracts away unimportant action switching with little explanatory value and shows the boundary between positive and negative outcomes. Qualitatively, we show that our method explains when an agent must switch actions. As a result, it is easier to understand the agent’s behavior. Quantitatively, we demonstrate that our method can increase policy performance and, at the same time, reduce how often the agent switches its action across six environments. The code and trained models are made open source.}
}

Endnote

%0 Conference Paper
%T Outcome-Based Semifactuals for Reinforcement Learning
%A Yanzhe Bekkemoen
%A Helge Langseth
%B Proceedings of the 17th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Hung-yi Lee
%E Tongliang Liu	
%F pmlr-v304-bekkemoen25a
%I PMLR
%P 574--589
%U https://proceedings.mlr.press/v304/bekkemoen25a.html
%V 304
%X Counterfactual explanations in reinforcement learning (RL) aim to answer what-if questions by demonstrating sparse and minimal changes to states, resulting in the probability mass moving from one action to another. Although these explanations are effective in classification tasks that look for the presence of concepts, RL brings new challenges that counterfactual methods need to solve. These challenges include defining state similarity, avoiding out-of-distribution states, and improving discriminative power of explanations. Given a state of interest called the query state, we solve these problems by asking how long the agent can execute the query state action without incurring a negative outcome regarding the expected return. We coin this outcome-based semifactual (OSF) explanation and find the OSF state by simulating trajectories from the query state. The last state in a subtrajectory where we can take the same action as in the query state without incurring a negative outcome is the OSF state. This state is discriminative, plausible, and similar to the query state. It abstracts away unimportant action switching with little explanatory value and shows the boundary between positive and negative outcomes. Qualitatively, we show that our method explains when an agent must switch actions. As a result, it is easier to understand the agent’s behavior. Quantitatively, we demonstrate that our method can increase policy performance and, at the same time, reduce how often the agent switches its action across six environments. The code and trained models are made open source.

APA

Bekkemoen, Y. & Langseth, H.. (2025). Outcome-Based Semifactuals for Reinforcement Learning. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:574-589 Available from https://proceedings.mlr.press/v304/bekkemoen25a.html.

Outcome-Based Semifactuals for Reinforcement Learning

Abstract

Cite this Paper

Related Material