Locally Substitutable Languages for Enhanced Inductive Leaps

François Coste, Gaëlle Garet, Jacques Nicolas
Proceedings of the Eleventh International Conference on Grammatical Inference, PMLR 21:97-111, 2012.

Abstract

Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.

Cite this Paper


BibTeX
@InProceedings{pmlr-v21-coste12a, title = {Locally Substitutable Languages for Enhanced Inductive Leaps}, author = {Coste, François and Garet, Gaëlle and Nicolas, Jacques}, booktitle = {Proceedings of the Eleventh International Conference on Grammatical Inference}, pages = {97--111}, year = {2012}, editor = {Heinz, Jeffrey and Higuera, Colin and Oates, Tim}, volume = {21}, series = {Proceedings of Machine Learning Research}, address = {University of Maryland, College Park, MD, USA}, month = {05--08 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v21/coste12a/coste12a.pdf}, url = {https://proceedings.mlr.press/v21/coste12a.html}, abstract = {Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.} }
Endnote
%0 Conference Paper %T Locally Substitutable Languages for Enhanced Inductive Leaps %A François Coste %A Gaëlle Garet %A Jacques Nicolas %B Proceedings of the Eleventh International Conference on Grammatical Inference %C Proceedings of Machine Learning Research %D 2012 %E Jeffrey Heinz %E Colin Higuera %E Tim Oates %F pmlr-v21-coste12a %I PMLR %P 97--111 %U https://proceedings.mlr.press/v21/coste12a.html %V 21 %X Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.
RIS
TY - CPAPER TI - Locally Substitutable Languages for Enhanced Inductive Leaps AU - François Coste AU - Gaëlle Garet AU - Jacques Nicolas BT - Proceedings of the Eleventh International Conference on Grammatical Inference DA - 2012/08/16 ED - Jeffrey Heinz ED - Colin Higuera ED - Tim Oates ID - pmlr-v21-coste12a PB - PMLR DP - Proceedings of Machine Learning Research VL - 21 SP - 97 EP - 111 L1 - http://proceedings.mlr.press/v21/coste12a/coste12a.pdf UR - https://proceedings.mlr.press/v21/coste12a.html AB - Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set. ER -
APA
Coste, F., Garet, G. & Nicolas, J.. (2012). Locally Substitutable Languages for Enhanced Inductive Leaps. Proceedings of the Eleventh International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 21:97-111 Available from https://proceedings.mlr.press/v21/coste12a.html.

Related Material