How to measure the topological quality of protein parse trees?

Mateusz Pyzik; François Coste; Witold Dyrka

How to measure the topological quality of protein parse trees?

Mateusz Pyzik, François Coste, Witold Dyrka

Proceedings of The 14th International Conference on Grammatical Inference 2018, PMLR 93:118-138, 2019.

Abstract

Human readability and, consequently, interpretability is often considered a key advantage of grammatical descriptors. Beyond the natural language, this is also true in analyzing biological sequences of RNA, typically modeled by grammars of at least context-free level of expressiveness. However, in protein sequence analysis, the explanatory power of grammatical descriptors beyond regular has never been thoroughly assessed. Since the biological meaning of a protein molecule is directly related to its spatial structure, it is justified to expect that the parse tree of a protein sequence reflects the spatial structure of the protein. In this piece of research, we propose and assess quantitative measures for comparing topology of the parse tree of a context-free grammar with topology of the protein structure succinctly represented by a contact map. Our results are potentially interesting beyond its bioinformatic context wherever a reference matrix of dependencies between sequence constituents is available.

Cite this Paper

BibTeX


@InProceedings{pmlr-v93-pyzik19a,
  title = 	 {How to measure the topological quality of protein parse trees?},
  author =       {Pyzik, Mateusz and Coste, Fran\c{c}ois and Dyrka, Witold},
  booktitle = 	 {Proceedings of The 14th International Conference on Grammatical Inference 2018},
  pages = 	 {118--138},
  year = 	 {2019},
  editor = 	 {Unold, Olgierd and Dyrka, Witold and Wieczorek, Wojciech},
  volume = 	 {93},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {feb},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v93/pyzik19a/pyzik19a.pdf},
  url = 	 {https://proceedings.mlr.press/v93/pyzik19a.html},
  abstract = 	 {Human readability and, consequently, interpretability is often considered a key advantage of grammatical descriptors. Beyond the natural language, this is also true in analyzing biological sequences of RNA, typically modeled by grammars of at least context-free level of expressiveness. However, in protein sequence analysis, the explanatory power of grammatical descriptors beyond regular has never been thoroughly assessed. Since the biological meaning of a protein molecule is directly related to its spatial structure, it is justified to expect that the parse tree of a protein sequence reflects the spatial structure of the protein. In this piece of research, we propose and assess quantitative measures for comparing topology of the parse tree of a context-free grammar with topology of the protein structure succinctly represented by a contact map. Our results are potentially interesting beyond its bioinformatic context wherever a reference matrix of dependencies between sequence constituents is available.}
}

Endnote

%0 Conference Paper
%T How to measure the topological quality of protein parse trees?
%A Mateusz Pyzik
%A François Coste
%A Witold Dyrka
%B Proceedings of The 14th International Conference on Grammatical Inference 2018
%C Proceedings of Machine Learning Research
%D 2019
%E Olgierd Unold
%E Witold Dyrka
%E Wojciech Wieczorek	
%F pmlr-v93-pyzik19a
%I PMLR
%P 118--138
%U https://proceedings.mlr.press/v93/pyzik19a.html
%V 93
%X Human readability and, consequently, interpretability is often considered a key advantage of grammatical descriptors. Beyond the natural language, this is also true in analyzing biological sequences of RNA, typically modeled by grammars of at least context-free level of expressiveness. However, in protein sequence analysis, the explanatory power of grammatical descriptors beyond regular has never been thoroughly assessed. Since the biological meaning of a protein molecule is directly related to its spatial structure, it is justified to expect that the parse tree of a protein sequence reflects the spatial structure of the protein. In this piece of research, we propose and assess quantitative measures for comparing topology of the parse tree of a context-free grammar with topology of the protein structure succinctly represented by a contact map. Our results are potentially interesting beyond its bioinformatic context wherever a reference matrix of dependencies between sequence constituents is available.

APA


Pyzik, M., Coste, F. & Dyrka, W.. (2019). How to measure the topological quality of protein parse trees?. Proceedings of The 14th International Conference on Grammatical Inference 2018, in Proceedings of Machine Learning Research 93:118-138 Available from https://proceedings.mlr.press/v93/pyzik19a.html.

Related Material

Download PDF