Metagenomic Binning using Connectivity-constrained Variational Autoencoders

Andre Lamurias, Alessandro Tibo, Katja Hose, Mads Albertsen, Thomas Dyhre Nielsen
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:18471-18481, 2023.

Abstract

Current state-of-the-art techniques for metagenomic binning only utilize local features for the individual DNA sequences (contigs), neglecting additional information such as the assembly graph, in which the contigs are connected according to overlapping reads, and gene markers identified in the contigs. In this paper, we propose the use of a Variational AutoEncoder (VAE) tailored to leverage auxiliary structural information about contig relations when learning contig representations for subsequent metagenomic binning. Our method, CCVAE, improves on previous work that used VAEs for learning latent representations of the individual contigs, by constraining these representations according to the connectivity information from the assembly graph. Additionally, we incorporate into the model additional information in the form of marker genes to better differentiate contigs from different genomes. Our experiments on both simulated and real-world datasets demonstrate that CCVAE outperforms current state-of-the-art techniques, thus providing a more effective method for metagenomic binning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-lamurias23a, title = {Metagenomic Binning using Connectivity-constrained Variational Autoencoders}, author = {Lamurias, Andre and Tibo, Alessandro and Hose, Katja and Albertsen, Mads and Nielsen, Thomas Dyhre}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {18471--18481}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/lamurias23a/lamurias23a.pdf}, url = {https://proceedings.mlr.press/v202/lamurias23a.html}, abstract = {Current state-of-the-art techniques for metagenomic binning only utilize local features for the individual DNA sequences (contigs), neglecting additional information such as the assembly graph, in which the contigs are connected according to overlapping reads, and gene markers identified in the contigs. In this paper, we propose the use of a Variational AutoEncoder (VAE) tailored to leverage auxiliary structural information about contig relations when learning contig representations for subsequent metagenomic binning. Our method, CCVAE, improves on previous work that used VAEs for learning latent representations of the individual contigs, by constraining these representations according to the connectivity information from the assembly graph. Additionally, we incorporate into the model additional information in the form of marker genes to better differentiate contigs from different genomes. Our experiments on both simulated and real-world datasets demonstrate that CCVAE outperforms current state-of-the-art techniques, thus providing a more effective method for metagenomic binning.} }
Endnote
%0 Conference Paper %T Metagenomic Binning using Connectivity-constrained Variational Autoencoders %A Andre Lamurias %A Alessandro Tibo %A Katja Hose %A Mads Albertsen %A Thomas Dyhre Nielsen %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-lamurias23a %I PMLR %P 18471--18481 %U https://proceedings.mlr.press/v202/lamurias23a.html %V 202 %X Current state-of-the-art techniques for metagenomic binning only utilize local features for the individual DNA sequences (contigs), neglecting additional information such as the assembly graph, in which the contigs are connected according to overlapping reads, and gene markers identified in the contigs. In this paper, we propose the use of a Variational AutoEncoder (VAE) tailored to leverage auxiliary structural information about contig relations when learning contig representations for subsequent metagenomic binning. Our method, CCVAE, improves on previous work that used VAEs for learning latent representations of the individual contigs, by constraining these representations according to the connectivity information from the assembly graph. Additionally, we incorporate into the model additional information in the form of marker genes to better differentiate contigs from different genomes. Our experiments on both simulated and real-world datasets demonstrate that CCVAE outperforms current state-of-the-art techniques, thus providing a more effective method for metagenomic binning.
APA
Lamurias, A., Tibo, A., Hose, K., Albertsen, M. & Nielsen, T.D.. (2023). Metagenomic Binning using Connectivity-constrained Variational Autoencoders. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:18471-18481 Available from https://proceedings.mlr.press/v202/lamurias23a.html.

Related Material