On learning history-based policies for controlling Markov decision processes

Gandharv Patil; Aditya Mahajan; Doina Precup

On learning history-based policies for controlling Markov decision processes

Gandharv Patil, Aditya Mahajan, Doina Precup

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3511-3519, 2024.

Abstract

Reinforcement learning (RL) folklore suggests that methods of function approximation based on history, such as recurrent neural networks or state abstractions that include past information, outperform those without memory, because function approximation in Markov decision processes (MDP) can lead to a scenario akin to dealing with a partially observable MDP (POMDP). However, formal analysis of history-based algorithms has been limited, with most existing frameworks concentrating on features without historical context. In this paper, we introduce a theoretical framework to examine the behaviour of RL algorithms that control an MDP using feature abstraction mappings based on historical data. Additionally, we leverage this framework to develop a practical RL algorithm and assess its performance across various continuous control tasks.

Cite this Paper

BibTeX

@InProceedings{pmlr-v238-patil24b,
  title = 	 {On learning history-based policies for controlling {M}arkov decision processes},
  author =       {Patil, Gandharv and Mahajan, Aditya and Precup, Doina},
  booktitle = 	 {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3511--3519},
  year = 	 {2024},
  editor = 	 {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen},
  volume = 	 {238},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {02--04 May},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v238/patil24b/patil24b.pdf},
  url = 	 {https://proceedings.mlr.press/v238/patil24b.html},
  abstract = 	 {Reinforcement learning (RL) folklore suggests that methods of function approximation based on history, such as recurrent neural networks or state abstractions that include past information, outperform those without memory, because function approximation in Markov decision processes (MDP) can lead to a scenario akin to dealing with a partially observable MDP (POMDP). However, formal analysis of history-based algorithms has been limited, with most existing frameworks concentrating on features without historical context. In this paper, we introduce a theoretical framework to examine the behaviour of RL algorithms that control an MDP using feature abstraction mappings based on historical data. Additionally, we leverage this framework to develop a practical RL algorithm and assess its performance across various continuous control tasks.}
}

Endnote

%0 Conference Paper
%T On learning history-based policies for controlling Markov decision processes
%A Gandharv Patil
%A Aditya Mahajan
%A Doina Precup
%B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2024
%E Sanjoy Dasgupta
%E Stephan Mandt
%E Yingzhen Li	
%F pmlr-v238-patil24b
%I PMLR
%P 3511--3519
%U https://proceedings.mlr.press/v238/patil24b.html
%V 238
%X Reinforcement learning (RL) folklore suggests that methods of function approximation based on history, such as recurrent neural networks or state abstractions that include past information, outperform those without memory, because function approximation in Markov decision processes (MDP) can lead to a scenario akin to dealing with a partially observable MDP (POMDP). However, formal analysis of history-based algorithms has been limited, with most existing frameworks concentrating on features without historical context. In this paper, we introduce a theoretical framework to examine the behaviour of RL algorithms that control an MDP using feature abstraction mappings based on historical data. Additionally, we leverage this framework to develop a practical RL algorithm and assess its performance across various continuous control tasks.

APA

Patil, G., Mahajan, A. & Precup, D.. (2024). On learning history-based policies for controlling Markov decision processes. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3511-3519 Available from https://proceedings.mlr.press/v238/patil24b.html.

On learning history-based policies for controlling Markov decision processes

Abstract

Cite this Paper

Related Material