Towards improving full-length ribosome density prediction by bridging sequence and graph-based representations

Mohan Vamsi Nallapareddy; Francesco Craighero; Cédric Gobet; Felix Naef; Pierre Vandergheynst

Towards improving full-length ribosome density prediction by bridging sequence and graph-based representations

Mohan Vamsi Nallapareddy, Francesco Craighero, Cédric Gobet, Felix Naef, Pierre Vandergheynst

Proceedings of the 19th Machine Learning in Computational Biology meeting, PMLR 261:38-52, 2024.

Abstract

Translation elongation plays an important role in regulating protein concentrations in the cell, and dysregulation of this process has been linked to several human diseases. In this study, we use data from ribo-seq experiments to model ribosome densities, and in turn, predict the speed of translation. The proposed method, RiboGL, combines graph and recurrent neural networks to account for both graph and sequence-based features. The model takes a graph representing the secondary structure of the mRNA sequence as input, which incorporates both sequence and structural codon neighbors. In our experiments, RiboGL greatly outperforms the state-of-the-art RiboMIMO model for ribosome density prediction. We also conduct ablation studies to justify the design choices made in building the pipeline. Additionally, we use gradient-based interpretability to understand how the codon context and the structural neighbors affect the ribosome density at the A-site. By individually analyzing the genes in the dataset, we elucidate how structural neighbors could also potentially play a role in defining the ribosome density. Importantly, since these neighbors can be far away in the sequence, a recurrent model alone could not easily extract this information. This study lays the foundation for understanding how the mRNA secondary structure can be exploited for ribosome density prediction, and how in the future other graph modalities such as features from the nascent polypeptide can be used to further our understanding of translation in general.

Cite this Paper

BibTeX


@InProceedings{pmlr-v261-nallapareddy24a,
  title = 	 {Towards improving full-length ribosome density prediction by bridging sequence and graph-based representations},
  author =       {Nallapareddy, Mohan Vamsi and Craighero, Francesco and Gobet, C\'edric and Naef, Felix and Vandergheynst, Pierre},
  booktitle = 	 {Proceedings of the 19th Machine Learning in Computational Biology meeting},
  pages = 	 {38--52},
  year = 	 {2024},
  editor = 	 {Knowles, David A and Mostafavi, Sara},
  volume = 	 {261},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {05--06 Sep},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v261/main/assets/nallapareddy24a/nallapareddy24a.pdf},
  url = 	 {https://proceedings.mlr.press/v261/nallapareddy24a.html},
  abstract = 	 {Translation elongation plays an important role in regulating protein concentrations in the cell, and dysregulation of this process has been linked to several human diseases. In this study, we use data from ribo-seq experiments to model ribosome densities, and in turn, predict the speed of translation. The proposed method, RiboGL, combines graph and recurrent neural networks to account for both graph and sequence-based features. The model takes a graph representing the secondary structure of the mRNA sequence as input, which incorporates both sequence and structural codon neighbors. In our experiments, RiboGL greatly outperforms the state-of-the-art RiboMIMO model for ribosome density prediction. We also conduct ablation studies to justify the design choices made in building the pipeline. Additionally, we use gradient-based interpretability to understand how the codon context and the structural neighbors affect the ribosome density at the A-site. By individually analyzing the genes in the dataset, we elucidate how structural neighbors could also potentially play a role in defining the ribosome density. Importantly, since these neighbors can be far away in the sequence, a recurrent model alone could not easily extract this information. This study lays the foundation for understanding how the mRNA secondary structure can be exploited for ribosome density prediction, and how in the future other graph modalities such as features from the nascent polypeptide can be used to further our understanding of translation in general.}
}

Endnote

%0 Conference Paper
%T Towards improving full-length ribosome density prediction by bridging sequence and graph-based representations
%A Mohan Vamsi Nallapareddy
%A Francesco Craighero
%A Cédric Gobet
%A Felix Naef
%A Pierre Vandergheynst
%B Proceedings of the 19th Machine Learning in Computational Biology meeting
%C Proceedings of Machine Learning Research
%D 2024
%E David A Knowles
%E Sara Mostafavi	
%F pmlr-v261-nallapareddy24a
%I PMLR
%P 38--52
%U https://proceedings.mlr.press/v261/nallapareddy24a.html
%V 261
%X Translation elongation plays an important role in regulating protein concentrations in the cell, and dysregulation of this process has been linked to several human diseases. In this study, we use data from ribo-seq experiments to model ribosome densities, and in turn, predict the speed of translation. The proposed method, RiboGL, combines graph and recurrent neural networks to account for both graph and sequence-based features. The model takes a graph representing the secondary structure of the mRNA sequence as input, which incorporates both sequence and structural codon neighbors. In our experiments, RiboGL greatly outperforms the state-of-the-art RiboMIMO model for ribosome density prediction. We also conduct ablation studies to justify the design choices made in building the pipeline. Additionally, we use gradient-based interpretability to understand how the codon context and the structural neighbors affect the ribosome density at the A-site. By individually analyzing the genes in the dataset, we elucidate how structural neighbors could also potentially play a role in defining the ribosome density. Importantly, since these neighbors can be far away in the sequence, a recurrent model alone could not easily extract this information. This study lays the foundation for understanding how the mRNA secondary structure can be exploited for ribosome density prediction, and how in the future other graph modalities such as features from the nascent polypeptide can be used to further our understanding of translation in general.

APA


Nallapareddy, M.V., Craighero, F., Gobet, C., Naef, F. & Vandergheynst, P.. (2024). Towards improving full-length ribosome density prediction by bridging sequence and graph-based representations. Proceedings of the 19th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 261:38-52 Available from https://proceedings.mlr.press/v261/nallapareddy24a.html.

Related Material

Download PDF