“What are my options?"€: Explaining RL Agents with Diverse Near-Optimal Alternatives

Noel Brindise, Vijeth Hebbar, Riya Shah, Cedric Langbort
Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, PMLR 283:1194-1205, 2025.

Abstract

In this work, we present a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA). DNA seeks a set of reasonable "options" for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to "explain" an agent’s options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different "options" in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v283-brindise25a, title = {“What are my options?"€: Explaining RL Agents with Diverse Near-Optimal Alternatives}, author = {Brindise, Noel and Hebbar, Vijeth and Shah, Riya and Langbort, Cedric}, booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference}, pages = {1194--1205}, year = {2025}, editor = {Ozay, Necmiye and Balzano, Laura and Panagou, Dimitra and Abate, Alessandro}, volume = {283}, series = {Proceedings of Machine Learning Research}, month = {04--06 Jun}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v283/main/assets/brindise25a/brindise25a.pdf}, url = {https://proceedings.mlr.press/v283/brindise25a.html}, abstract = {In this work, we present a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA). DNA seeks a set of reasonable "options" for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to "explain" an agent’s options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different "options" in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity.} }
Endnote
%0 Conference Paper %T “What are my options?"€: Explaining RL Agents with Diverse Near-Optimal Alternatives %A Noel Brindise %A Vijeth Hebbar %A Riya Shah %A Cedric Langbort %B Proceedings of the 7th Annual Learning for Dynamics \& Control Conference %C Proceedings of Machine Learning Research %D 2025 %E Necmiye Ozay %E Laura Balzano %E Dimitra Panagou %E Alessandro Abate %F pmlr-v283-brindise25a %I PMLR %P 1194--1205 %U https://proceedings.mlr.press/v283/brindise25a.html %V 283 %X In this work, we present a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA). DNA seeks a set of reasonable "options" for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to "explain" an agent’s options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute meaningfully different "options" in simulation, including a brief comparison to related approaches in the stochastic optimization field of Quality Diversity.
APA
Brindise, N., Hebbar, V., Shah, R. & Langbort, C.. (2025). “What are my options?"€: Explaining RL Agents with Diverse Near-Optimal Alternatives. Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, in Proceedings of Machine Learning Research 283:1194-1205 Available from https://proceedings.mlr.press/v283/brindise25a.html.

Related Material