Locally Substitutable Languages for Enhanced Inductive Leaps

François Coste; Gaëlle Garet; Jacques Nicolas

Locally Substitutable Languages for Enhanced Inductive Leaps

François Coste, Gaëlle Garet, Jacques Nicolas

Proceedings of the Eleventh International Conference on Grammatical Inference, PMLR 21:97-111, 2012.

Abstract

Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines. Protein annotation is a task of first importance with respect to these banks. It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families. Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models. We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.

Cite this Paper

BibTeX

@InProceedings{pmlr-v21-coste12a,
  title = 	 {Locally Substitutable Languages for Enhanced Inductive Leaps},
  author = 	 {Coste, François and Garet, Gaëlle and Nicolas, Jacques},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Grammatical Inference},
  pages = 	 {97--111},
  year = 	 {2012},
  editor = 	 {Heinz, Jeffrey and Higuera, Colin and Oates, Tim},
  volume = 	 {21},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {University of Maryland, College Park, MD, USA},
  month = 	 {05--08 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v21/coste12a/coste12a.pdf},
  url = 	 {https://proceedings.mlr.press/v21/coste12a.html},
  abstract = 	 {Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines.  Protein annotation is a task of first importance with respect to these banks.  It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families.  Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models.  We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.}
}

Endnote

%0 Conference Paper
%T Locally Substitutable Languages for Enhanced Inductive Leaps
%A François Coste
%A Gaëlle Garet
%A Jacques Nicolas
%B Proceedings of the Eleventh International Conference on Grammatical Inference
%C Proceedings of Machine Learning Research
%D 2012
%E Jeffrey Heinz
%E Colin Higuera
%E Tim Oates	
%F pmlr-v21-coste12a
%I PMLR
%P 97--111
%U https://proceedings.mlr.press/v21/coste12a.html
%V 21
%X Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines.  Protein annotation is a task of first importance with respect to these banks.  It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families.  Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models.  We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.

RIS

TY  - CPAPER
TI  - Locally Substitutable Languages for Enhanced Inductive Leaps
AU  - François Coste
AU  - Gaëlle Garet
AU  - Jacques Nicolas
BT  - Proceedings of the Eleventh International Conference on Grammatical Inference
DA  - 2012/08/16
ED  - Jeffrey Heinz
ED  - Colin Higuera
ED  - Tim Oates	
ID  - pmlr-v21-coste12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 21
SP  - 97
EP  - 111
L1  - http://proceedings.mlr.press/v21/coste12a/coste12a.pdf
UR  - https://proceedings.mlr.press/v21/coste12a.html
AB  - Genomic banks are fed continuously by large sets of DNA or RNA sequences coming from high throughput machines.  Protein annotation is a task of first importance with respect to these banks.  It consists of retrieving the genes that code for proteins within the sequences and then predict the function of these new proteins in the cell by comparison with known families.  Many methods have been designed to characterize protein families and discover new members, mainly based on subsets of regular expressions or simple Hidden Markov Models.  We are interested in more expressive models that are able to capture the long-range characteristic interactions occurring in the spatial structure of the analyzed protein family. Starting from the work of Clark and Eyraud (2007) and Yoshinaka (2008) on inference of substitutable and \emphk,l-substitutable languages respectively, we introduce new classes of substitutable languages using local rather than global substitutability, a reasonable assumption with respect to protein structures to enhance inductive leaps performed by least generalized generalization approaches. The concepts are illustrated on a first experiment using a real proteic sequence set.
ER  -

APA

Coste, F., Garet, G. & Nicolas, J.. (2012). Locally Substitutable Languages for Enhanced Inductive Leaps. Proceedings of the Eleventh International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 21:97-111 Available from https://proceedings.mlr.press/v21/coste12a.html.

Related Material

Download PDF