Learning the Target Network in Function Space

Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1902-1923, 2024.

Abstract

We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-asadi24a, title = {Learning the Target Network in Function Space}, author = {Asadi, Kavosh and Liu, Yao and Sabach, Shoham and Yin, Ming and Fakoor, Rasool}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {1902--1923}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/asadi24a/asadi24a.pdf}, url = {https://proceedings.mlr.press/v235/asadi24a.html}, abstract = {We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.} }
Endnote
%0 Conference Paper %T Learning the Target Network in Function Space %A Kavosh Asadi %A Yao Liu %A Shoham Sabach %A Ming Yin %A Rasool Fakoor %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-asadi24a %I PMLR %P 1902--1923 %U https://proceedings.mlr.press/v235/asadi24a.html %V 235 %X We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.
APA
Asadi, K., Liu, Y., Sabach, S., Yin, M. & Fakoor, R.. (2024). Learning the Target Network in Function Space. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:1902-1923 Available from https://proceedings.mlr.press/v235/asadi24a.html.

Related Material