Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition

Jiong Zhang, Ian En-Hsu Yen, Pradeep Ravikumar, Inderjit Dhillon
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1514-1522, 2017.

Abstract

Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm, which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-zhang17b, title = {{Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition}}, author = {Zhang, Jiong and Yen, Ian En-Hsu and Ravikumar, Pradeep and Dhillon, Inderjit}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {1514--1522}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/zhang17b/zhang17b.pdf}, url = {https://proceedings.mlr.press/v54/zhang17b.html}, abstract = {Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm, which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.} }
Endnote
%0 Conference Paper %T Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition %A Jiong Zhang %A Ian En-Hsu Yen %A Pradeep Ravikumar %A Inderjit Dhillon %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-zhang17b %I PMLR %P 1514--1522 %U https://proceedings.mlr.press/v54/zhang17b.html %V 54 %X Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm, which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.
APA
Zhang, J., Yen, I.E., Ravikumar, P. & Dhillon, I.. (2017). Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:1514-1522 Available from https://proceedings.mlr.press/v54/zhang17b.html.

Related Material