Embarrassingly Parallel GFlowNets

Tiago Silva; Luiz Max Carvalho; Amauri H Souza; Samuel Kaski; Diego Mesquita

Embarrassingly Parallel GFlowNets

Tiago Silva, Luiz Max Carvalho, Amauri H Souza, Samuel Kaski, Diego Mesquita

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:45406-45431, 2024.

Abstract

GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution, or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form

$R(\cdot) \propto R_1(\cdot) ... R_N(\cdot)$ — e.g., in parallel or federated Bayes, where each

$R_n$ is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each

$R_n$ and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed aggregating balance condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the effectiveness of EP-GFlowNets on multiple tasks, including parallel Bayesian phylogenetics, multi-objective multiset and sequence generation, and federated Bayesian structure learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-silva24a,
  title = 	 {Embarrassingly Parallel {GF}low{N}ets},
  author =       {Silva, Tiago and Carvalho, Luiz Max and Souza, Amauri H and Kaski, Samuel and Mesquita, Diego},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {45406--45431},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/silva24a/silva24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/silva24a.html},
  abstract = 	 {GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution, or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form $R(\cdot) \propto R_1(\cdot) ... R_N(\cdot)$ — e.g., in parallel or federated Bayes, where each $R_n$ is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each $R_n$ and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed aggregating balance condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the effectiveness of EP-GFlowNets on multiple tasks, including parallel Bayesian phylogenetics, multi-objective multiset and sequence generation, and federated Bayesian structure learning.}
}

Endnote

%0 Conference Paper
%T Embarrassingly Parallel GFlowNets
%A Tiago Silva
%A Luiz Max Carvalho
%A Amauri H Souza
%A Samuel Kaski
%A Diego Mesquita
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-silva24a
%I PMLR
%P 45406--45431
%U https://proceedings.mlr.press/v235/silva24a.html
%V 235
%X GFlowNets are a promising alternative to MCMC sampling for discrete compositional random variables. Training GFlowNets requires repeated evaluations of the unnormalized target distribution, or reward function. However, for large-scale posterior sampling, this may be prohibitive since it incurs traversing the data several times. Moreover, if the data are distributed across clients, employing standard GFlowNets leads to intensive client-server communication. To alleviate both these issues, we propose embarrassingly parallel GFlowNet (EP-GFlowNet). EP-GFlowNet is a provably correct divide-and-conquer method to sample from product distributions of the form $R(\cdot) \propto R_1(\cdot) ... R_N(\cdot)$ — e.g., in parallel or federated Bayes, where each $R_n$ is a local posterior defined on a data partition. First, in parallel, we train a local GFlowNet targeting each $R_n$ and send the resulting models to the server. Then, the server learns a global GFlowNet by enforcing our newly proposed aggregating balance condition, requiring a single communication step. Importantly, EP-GFlowNets can also be applied to multi-objective optimization and model reuse. Our experiments illustrate the effectiveness of EP-GFlowNets on multiple tasks, including parallel Bayesian phylogenetics, multi-objective multiset and sequence generation, and federated Bayesian structure learning.

APA


Silva, T., Carvalho, L.M., Souza, A.H., Kaski, S. & Mesquita, D.. (2024). Embarrassingly Parallel GFlowNets. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:45406-45431 Available from https://proceedings.mlr.press/v235/silva24a.html.

Embarrassingly Parallel GFlowNets

Abstract

Cite this Paper

Related Material