GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL

Sumedh A. Sontakke, Stephen Iota, Zizhao Hu, Arash Mehrjou, Laurent Itti, Bernhard Schölkopf
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:7518-7530, 2022.

Abstract

Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data and this data is a function of the policy followed by the agent. Thus, an agent could neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first - defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task - that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent which actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it also discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that our method outperforms the baseline significantly.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-sontakke22a, title = { GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL }, author = {Sontakke, Sumedh A. and Iota, Stephen and Hu, Zizhao and Mehrjou, Arash and Itti, Laurent and Sch\"olkopf, Bernhard}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {7518--7530}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/sontakke22a/sontakke22a.pdf}, url = {https://proceedings.mlr.press/v151/sontakke22a.html}, abstract = { Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data and this data is a function of the policy followed by the agent. Thus, an agent could neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first - defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task - that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent which actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it also discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that our method outperforms the baseline significantly. } }
Endnote
%0 Conference Paper %T GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL %A Sumedh A. Sontakke %A Stephen Iota %A Zizhao Hu %A Arash Mehrjou %A Laurent Itti %A Bernhard Schölkopf %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-sontakke22a %I PMLR %P 7518--7530 %U https://proceedings.mlr.press/v151/sontakke22a.html %V 151 %X Out-of-distribution (OOD) detection is a well-studied topic in supervised learning. Extending the successes in supervised learning methods to the reinforcement learning (RL) setting, however, is difficult due to the data generating process - RL agents actively query their environment for data and this data is a function of the policy followed by the agent. Thus, an agent could neglect a shift in the environment if its policy did not lead it to explore the aspect of the environment that shifted. Therefore, to achieve safe and robust generalization in RL, there exists an unmet need for OOD detection through active experimentation. Here, we attempt to bridge this lacuna by first - defining a causal framework for OOD scenarios or environments encountered by RL agents in the wild. Then, we propose a novel task - that of Out-of-Task Distribution (OOTD) detection. We introduce an RL agent which actively experiments in a test environment and subsequently concludes whether it is OOTD or not. We name our method GalilAI, in honor of Galileo Galilei, as it also discovers, among other causal processes, that gravitational acceleration is independent of the mass of a body. Finally, we propose a simple probabilistic neural network baseline for comparison, which extends extant Model-Based RL. We find that our method outperforms the baseline significantly.
APA
Sontakke, S.A., Iota, S., Hu, Z., Mehrjou, A., Itti, L. & Schölkopf, B.. (2022). GalilAI: Out-of-Task Distribution Detection using Causal Active Experimentation for Safe Transfer RL . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:7518-7530 Available from https://proceedings.mlr.press/v151/sontakke22a.html.

Related Material