PALM: Probabilistic area loss Minimization for Protein Sequence Alignment

Fan Ding, Nan Jiang, Jianzhu Ma, Jian Peng, Jinbo Xu, Yexiang Xue
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:1100-1109, 2021.

Abstract

Protein sequence alignment is a fundamental problem in computational structure biology and popular for protein 3D structural prediction and protein homology detection. Most of the developed programs for detecting protein sequence alignments are based upon the likelihood information of amino acids and are sensitive to alignment noises. We present a novel method PALM for modeling pairwise protein structure alignments, using the area distance to reduce the biological measurement noise. PALM generatively learn the alignment of two protein sequences with probabilistic area distance objective, which can denoise the measurement errors contained in the ground-truth alignments. During learning, we show that the optimization is computationally efficient by estimating the gradients via dynamically sampling alignments. Empirically, we show that PALM can generate sequence alignments with higher precision and recall, as well as smaller area distance than the competing methods especially for long protein sequences and remote homologies. This study implies for learning over large-scale protein sequence alignment problems, one could potentially give PALM a try.

Cite this Paper


BibTeX
@InProceedings{pmlr-v161-ding21c, title = {PALM: Probabilistic area loss Minimization for Protein Sequence Alignment}, author = {Ding, Fan and Jiang, Nan and Ma, Jianzhu and Peng, Jian and Xu, Jinbo and Xue, Yexiang}, booktitle = {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence}, pages = {1100--1109}, year = {2021}, editor = {de Campos, Cassio and Maathuis, Marloes H.}, volume = {161}, series = {Proceedings of Machine Learning Research}, month = {27--30 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v161/ding21c/ding21c.pdf}, url = {https://proceedings.mlr.press/v161/ding21c.html}, abstract = {Protein sequence alignment is a fundamental problem in computational structure biology and popular for protein 3D structural prediction and protein homology detection. Most of the developed programs for detecting protein sequence alignments are based upon the likelihood information of amino acids and are sensitive to alignment noises. We present a novel method PALM for modeling pairwise protein structure alignments, using the area distance to reduce the biological measurement noise. PALM generatively learn the alignment of two protein sequences with probabilistic area distance objective, which can denoise the measurement errors contained in the ground-truth alignments. During learning, we show that the optimization is computationally efficient by estimating the gradients via dynamically sampling alignments. Empirically, we show that PALM can generate sequence alignments with higher precision and recall, as well as smaller area distance than the competing methods especially for long protein sequences and remote homologies. This study implies for learning over large-scale protein sequence alignment problems, one could potentially give PALM a try.} }
Endnote
%0 Conference Paper %T PALM: Probabilistic area loss Minimization for Protein Sequence Alignment %A Fan Ding %A Nan Jiang %A Jianzhu Ma %A Jian Peng %A Jinbo Xu %A Yexiang Xue %B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2021 %E Cassio de Campos %E Marloes H. Maathuis %F pmlr-v161-ding21c %I PMLR %P 1100--1109 %U https://proceedings.mlr.press/v161/ding21c.html %V 161 %X Protein sequence alignment is a fundamental problem in computational structure biology and popular for protein 3D structural prediction and protein homology detection. Most of the developed programs for detecting protein sequence alignments are based upon the likelihood information of amino acids and are sensitive to alignment noises. We present a novel method PALM for modeling pairwise protein structure alignments, using the area distance to reduce the biological measurement noise. PALM generatively learn the alignment of two protein sequences with probabilistic area distance objective, which can denoise the measurement errors contained in the ground-truth alignments. During learning, we show that the optimization is computationally efficient by estimating the gradients via dynamically sampling alignments. Empirically, we show that PALM can generate sequence alignments with higher precision and recall, as well as smaller area distance than the competing methods especially for long protein sequences and remote homologies. This study implies for learning over large-scale protein sequence alignment problems, one could potentially give PALM a try.
APA
Ding, F., Jiang, N., Ma, J., Peng, J., Xu, J. & Xue, Y.. (2021). PALM: Probabilistic area loss Minimization for Protein Sequence Alignment. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:1100-1109 Available from https://proceedings.mlr.press/v161/ding21c.html.

Related Material