Improving transformer secondary structure predictions with secondary structure ”fixing” task

Juneki Hong; Dezhong Deng; David A. Hendrix

Improving transformer secondary structure predictions with secondary structure ”fixing” task

Juneki Hong, Dezhong Deng, David A. Hendrix

Proceedings of the 18th Machine Learning in Computational Biology meeting, PMLR 240:267-278, 2024.

Abstract

The structure of RNA can determine the function, and improvements in RNA secondary structure prediction can help in understanding the functions of RNA. Nucleotides in RNA sequences form base-pairing interactions in context-specific preferential behavior to help determine the secondary structure. Structure prediction algorithms have been developed to predict the secondary structure, including dynamic programming, and machine learning approaches. One of the central challenges in the prediction of secondary structure with deep learning is that these architectures are not good at bracketed structure prediction. To overcome this challenge, we present a deep learning approach for predicting secondary structure that uses an input predicted structure to provide a scaffolding for the structure prediction. We find that architectures using LSTM and self-attention-based transformer layers predict a strong baseline in the prediction of base pairs (F1=53.73), but significantly improves (F1=59.52) when predictions from dynamic programming methods are provided as input. Model interpretation shows that patterns of attention for different layers of the network are enriched for specific paired regions or regions that should be paired. Analysis of neural network models like this can shed light on possible missed interactions, and what other positions contribute most to output fixed positions.

Cite this Paper

BibTeX


@InProceedings{pmlr-v240-hong24a,
  title = 	 {Improving transformer secondary structure predictions with secondary structure ”fixing” task},
  author =       {Hong, Juneki and Deng, Dezhong and Hendrix, David A.},
  booktitle = 	 {Proceedings of the 18th Machine Learning in Computational Biology meeting},
  pages = 	 {267--278},
  year = 	 {2024},
  editor = 	 {Knowles, David A. and Mostafavi, Sara},
  volume = 	 {240},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Nov--01 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v240/hong24a/hong24a.pdf},
  url = 	 {https://proceedings.mlr.press/v240/hong24a.html},
  abstract = 	 {The structure of RNA can determine the function, and improvements in RNA secondary structure prediction can help in understanding the functions of RNA. Nucleotides in RNA sequences form base-pairing interactions in context-specific preferential behavior to help determine the secondary structure. Structure prediction algorithms have been developed to predict the secondary structure, including dynamic programming, and machine learning approaches. One of the central challenges in the prediction of secondary structure with deep learning is that these architectures are not good at bracketed structure prediction. To overcome this challenge, we present a deep learning approach for predicting secondary structure that uses an input predicted structure to provide a scaffolding for the structure prediction. We find that architectures using LSTM and  self-attention-based transformer layers predict a strong baseline in the prediction of base pairs (F1=53.73), but significantly improves (F1=59.52) when predictions from dynamic programming methods are provided as input. Model interpretation shows that patterns of attention for different layers of the network are enriched for specific paired regions or regions that should be paired. Analysis of neural network models like this can shed light on possible missed interactions, and what other positions contribute most to output fixed positions.}
}

Endnote

%0 Conference Paper
%T Improving transformer secondary structure predictions with secondary structure ”fixing” task
%A Juneki Hong
%A Dezhong Deng
%A David A. Hendrix
%B Proceedings of the 18th Machine Learning in Computational Biology meeting
%C Proceedings of Machine Learning Research
%D 2024
%E David A. Knowles
%E Sara Mostafavi	
%F pmlr-v240-hong24a
%I PMLR
%P 267--278
%U https://proceedings.mlr.press/v240/hong24a.html
%V 240
%X The structure of RNA can determine the function, and improvements in RNA secondary structure prediction can help in understanding the functions of RNA. Nucleotides in RNA sequences form base-pairing interactions in context-specific preferential behavior to help determine the secondary structure. Structure prediction algorithms have been developed to predict the secondary structure, including dynamic programming, and machine learning approaches. One of the central challenges in the prediction of secondary structure with deep learning is that these architectures are not good at bracketed structure prediction. To overcome this challenge, we present a deep learning approach for predicting secondary structure that uses an input predicted structure to provide a scaffolding for the structure prediction. We find that architectures using LSTM and  self-attention-based transformer layers predict a strong baseline in the prediction of base pairs (F1=53.73), but significantly improves (F1=59.52) when predictions from dynamic programming methods are provided as input. Model interpretation shows that patterns of attention for different layers of the network are enriched for specific paired regions or regions that should be paired. Analysis of neural network models like this can shed light on possible missed interactions, and what other positions contribute most to output fixed positions.

APA


Hong, J., Deng, D. & Hendrix, D.A.. (2024). Improving transformer secondary structure predictions with secondary structure ”fixing” task. Proceedings of the 18th Machine Learning in Computational Biology meeting, in Proceedings of Machine Learning Research 240:267-278 Available from https://proceedings.mlr.press/v240/hong24a.html.

Related Material

Download PDF