How to measure the topological quality of protein parse trees?

Mateusz Pyzik, François Coste, Witold Dyrka
Proceedings of The 14th International Conference on Grammatical Inference 2018, PMLR 93:118-138, 2019.

Abstract

Human readability and, consequently, interpretability is often considered a key advantage of grammatical descriptors. Beyond the natural language, this is also true in analyzing biological sequences of RNA, typically modeled by grammars of at least context-free level of expressiveness. However, in protein sequence analysis, the explanatory power of grammatical descriptors beyond regular has never been thoroughly assessed. Since the biological meaning of a protein molecule is directly related to its spatial structure, it is justified to expect that the parse tree of a protein sequence reflects the spatial structure of the protein. In this piece of research, we propose and assess quantitative measures for comparing topology of the parse tree of a context-free grammar with topology of the protein structure succinctly represented by a contact map. Our results are potentially interesting beyond its bioinformatic context wherever a reference matrix of dependencies between sequence constituents is available.

Cite this Paper


BibTeX
@InProceedings{pmlr-v93-pyzik19a, title = {How to measure the topological quality of protein parse trees?}, author = {Pyzik, Mateusz and Coste, Fran\c{c}ois and Dyrka, Witold}, booktitle = {Proceedings of The 14th International Conference on Grammatical Inference 2018}, pages = {118--138}, year = {2019}, editor = {Unold, Olgierd and Dyrka, Witold and Wieczorek, Wojciech}, volume = {93}, series = {Proceedings of Machine Learning Research}, month = {feb}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v93/pyzik19a/pyzik19a.pdf}, url = {https://proceedings.mlr.press/v93/pyzik19a.html}, abstract = {Human readability and, consequently, interpretability is often considered a key advantage of grammatical descriptors. Beyond the natural language, this is also true in analyzing biological sequences of RNA, typically modeled by grammars of at least context-free level of expressiveness. However, in protein sequence analysis, the explanatory power of grammatical descriptors beyond regular has never been thoroughly assessed. Since the biological meaning of a protein molecule is directly related to its spatial structure, it is justified to expect that the parse tree of a protein sequence reflects the spatial structure of the protein. In this piece of research, we propose and assess quantitative measures for comparing topology of the parse tree of a context-free grammar with topology of the protein structure succinctly represented by a contact map. Our results are potentially interesting beyond its bioinformatic context wherever a reference matrix of dependencies between sequence constituents is available.} }
Endnote
%0 Conference Paper %T How to measure the topological quality of protein parse trees? %A Mateusz Pyzik %A François Coste %A Witold Dyrka %B Proceedings of The 14th International Conference on Grammatical Inference 2018 %C Proceedings of Machine Learning Research %D 2019 %E Olgierd Unold %E Witold Dyrka %E Wojciech Wieczorek %F pmlr-v93-pyzik19a %I PMLR %P 118--138 %U https://proceedings.mlr.press/v93/pyzik19a.html %V 93 %X Human readability and, consequently, interpretability is often considered a key advantage of grammatical descriptors. Beyond the natural language, this is also true in analyzing biological sequences of RNA, typically modeled by grammars of at least context-free level of expressiveness. However, in protein sequence analysis, the explanatory power of grammatical descriptors beyond regular has never been thoroughly assessed. Since the biological meaning of a protein molecule is directly related to its spatial structure, it is justified to expect that the parse tree of a protein sequence reflects the spatial structure of the protein. In this piece of research, we propose and assess quantitative measures for comparing topology of the parse tree of a context-free grammar with topology of the protein structure succinctly represented by a contact map. Our results are potentially interesting beyond its bioinformatic context wherever a reference matrix of dependencies between sequence constituents is available.
APA
Pyzik, M., Coste, F. & Dyrka, W.. (2019). How to measure the topological quality of protein parse trees?. Proceedings of The 14th International Conference on Grammatical Inference 2018, in Proceedings of Machine Learning Research 93:118-138 Available from https://proceedings.mlr.press/v93/pyzik19a.html.

Related Material