Towards creating longer genetic sequences with GANs: Generation in principal component space

Antoine Szatkownik, Cyril Furtlehner, Guillaume Charpiat, Burak Yelmen, Flora Jay
Proceedings of the 18th Machine Learning in Computational Biology meeting, PMLR 240:110-122, 2024.

Abstract

Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. We compare the quality of AGs generated by our approach with AGs generated by the established models and report improvements on capturing population structure and linkage disequilibrium.

Cite this Paper


BibTeX
@InProceedings{pmlr-v240-szatkownik24a, title = {Towards creating longer genetic sequences with GANs: Generation in principal component space}, author = {Szatkownik, Antoine and Furtlehner, Cyril and Charpiat, Guillaume and Yelmen, Burak and Jay, Flora}, booktitle = {Proceedings of the 18th Machine Learning in Computational Biology meeting}, pages = {110--122}, year = {2024}, editor = {Knowles, David A. and Mostafavi, Sara}, volume = {240}, series = {Proceedings of Machine Learning Research}, month = {30 Nov--01 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v240/szatkownik24a/szatkownik24a.pdf}, url = {https://proceedings.mlr.press/v240/szatkownik24a.html}, abstract = {Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. We compare the quality of AGs generated by our approach with AGs generated by the established models and report improvements on capturing population structure and linkage disequilibrium.} }
Endnote
%0 Conference Paper %T Towards creating longer genetic sequences with GANs: Generation in principal component space %A Antoine Szatkownik %A Cyril Furtlehner %A Guillaume Charpiat %A Burak Yelmen %A Flora Jay %B Proceedings of the 18th Machine Learning in Computational Biology meeting %C Proceedings of Machine Learning Research %D 2024 %E David A. Knowles %E Sara Mostafavi %F pmlr-v240-szatkownik24a %I PMLR %P 110--122 %U https://proceedings.mlr.press/v240/szatkownik24a.html %V 240 %X Synthetic data generation via generative modeling has recently become a prominent research field in genomics, with applications ranging from functional sequence design to high-quality, privacy-preserving artificial in silico genomes. Following a body of work on Artificial Genomes (AGs) created via various generative models trained with raw genomic input, we propose a conceptually different approach to address the issues of scalability and complexity of genomic data generation in very high dimensions. Our method combines dimensionality reduction, achieved by Principal Component Analysis (PCA), and a Generative Adversarial Network (GAN) learning in this reduced space. We compare the quality of AGs generated by our approach with AGs generated by the established models and report improvements on capturing population structure and linkage disequilibrium.
APA
Szatkownik, A., Furtlehner, C., Charpiat, G., Yelmen, B. & Jay, F.. (2024). Towards creating longer genetic sequences with GANs: Generation in principal component space. Proceedings of the 18th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 240:110-122 Available from https://proceedings.mlr.press/v240/szatkownik24a.html.

Related Material