Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

Cheng Tan, Zhangyang Gao, Hanqun Cao, Xingran Chen, Ge Wang, Lirong Wu, Jun Xia, Jiangbin Zheng, Stan Z. Li
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:47564-47578, 2024.

Abstract

The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code is available at https://github.com/A4Bio/RFold.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-tan24a, title = {Deciphering {RNA} Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective}, author = {Tan, Cheng and Gao, Zhangyang and Cao, Hanqun and Chen, Xingran and Wang, Ge and Wu, Lirong and Xia, Jun and Zheng, Jiangbin and Li, Stan Z.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {47564--47578}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/tan24a/tan24a.pdf}, url = {https://proceedings.mlr.press/v235/tan24a.html}, abstract = {The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code is available at https://github.com/A4Bio/RFold.} }
Endnote
%0 Conference Paper %T Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective %A Cheng Tan %A Zhangyang Gao %A Hanqun Cao %A Xingran Chen %A Ge Wang %A Lirong Wu %A Jun Xia %A Jiangbin Zheng %A Stan Z. Li %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-tan24a %I PMLR %P 47564--47578 %U https://proceedings.mlr.press/v235/tan24a.html %V 235 %X The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code is available at https://github.com/A4Bio/RFold.
APA
Tan, C., Gao, Z., Cao, H., Chen, X., Wang, G., Wu, L., Xia, J., Zheng, J. & Li, S.Z.. (2024). Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:47564-47578 Available from https://proceedings.mlr.press/v235/tan24a.html.

Related Material