Learning the Target Network in Function Space

Kavosh Asadi; Yao Liu; Shoham Sabach; Ming Yin; Rasool Fakoor

Learning the Target Network in Function Space

Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1902-1923, 2024.

Abstract

We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-asadi24a,
  title = 	 {Learning the Target Network in Function Space},
  author =       {Asadi, Kavosh and Liu, Yao and Sabach, Shoham and Yin, Ming and Fakoor, Rasool},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {1902--1923},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/asadi24a/asadi24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/asadi24a.html},
  abstract = 	 {We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.}
}

Endnote

%0 Conference Paper
%T Learning the Target Network in Function Space
%A Kavosh Asadi
%A Yao Liu
%A Shoham Sabach
%A Ming Yin
%A Rasool Fakoor
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-asadi24a
%I PMLR
%P 1902--1923
%U https://proceedings.mlr.press/v235/asadi24a.html
%V 235
%X We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

APA


Asadi, K., Liu, Y., Sabach, S., Yin, M. & Fakoor, R.. (2024). Learning the Target Network in Function Space. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:1902-1923 Available from https://proceedings.mlr.press/v235/asadi24a.html.

Learning the Target Network in Function Space

Abstract

Cite this Paper

Related Material