GFlowNet Training by Policy Gradients

Puhua Niu, Shili Wu, Mingzhou Fan, Xiaoning Qian
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:38344-38380, 2024.

Abstract

Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL. It is known that the design of backward policies in GFlowNet training affects efficiency. We further develop a coupled training strategy that jointly solves GFlowNet forward policy training and backward policy design. Performance analysis is provided with a theoretical guarantee of our policy-based GFlowNet training. Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlowNet performance. Our code is available at: github.com/niupuhua1234/GFN-PG.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-niu24c, title = {{GF}low{N}et Training by Policy Gradients}, author = {Niu, Puhua and Wu, Shili and Fan, Mingzhou and Qian, Xiaoning}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {38344--38380}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/niu24c/niu24c.pdf}, url = {https://proceedings.mlr.press/v235/niu24c.html}, abstract = {Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL. It is known that the design of backward policies in GFlowNet training affects efficiency. We further develop a coupled training strategy that jointly solves GFlowNet forward policy training and backward policy design. Performance analysis is provided with a theoretical guarantee of our policy-based GFlowNet training. Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlowNet performance. Our code is available at: github.com/niupuhua1234/GFN-PG.} }
Endnote
%0 Conference Paper %T GFlowNet Training by Policy Gradients %A Puhua Niu %A Shili Wu %A Mingzhou Fan %A Xiaoning Qian %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-niu24c %I PMLR %P 38344--38380 %U https://proceedings.mlr.press/v235/niu24c.html %V 235 %X Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL. It is known that the design of backward policies in GFlowNet training affects efficiency. We further develop a coupled training strategy that jointly solves GFlowNet forward policy training and backward policy design. Performance analysis is provided with a theoretical guarantee of our policy-based GFlowNet training. Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlowNet performance. Our code is available at: github.com/niupuhua1234/GFN-PG.
APA
Niu, P., Wu, S., Fan, M. & Qian, X.. (2024). GFlowNet Training by Policy Gradients. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:38344-38380 Available from https://proceedings.mlr.press/v235/niu24c.html.

Related Material