A bottom-up efficient algorithm learning substitutable languages from positive examples

François Coste; Gaëlle Garet; Jacques Nicolas

A bottom-up efficient algorithm learning substitutable languages from positive examples

François Coste, Gaëlle Garet, Jacques Nicolas

The 12th International Conference on Grammatical Inference, PMLR 34:49-63, 2014.

Abstract

Based on Harris’s substitutability criterion, the recent definitions of classes of substitutable languages have led to interesting polynomial learnability results for expressive formal languages. These classes are also promising for practical applications: in natural language analysis, because definitions have strong linguisitic support, but also in biology for modeling protein families, as suggested in our previous study introducing the class of local substitutable languages. But turning recent theoretical advances into practice badly needs truly practical algorithms. We present here an efficient learning algorithm, motivated by intelligibility and parsing efficiency of the result, which directly reduces the positive sample into a small non-redundant canonical grammar of the target substitutable language. Thanks to this new algorithm, we have been able to extend our experimentation to a complete protein dataset confirming that it is possible to learn grammars on proteins with high specificity and good sensitivity by a generalization based on local substitutability.

Cite this Paper

BibTeX


@InProceedings{pmlr-v34-coste14a,
  title = 	 {A bottom-up efficient algorithm learning substitutable languages from positive examples},
  author = 	 {Coste, François and Garet, Gaëlle and Nicolas, Jacques},
  booktitle = 	 {The 12th International Conference on Grammatical Inference},
  pages = 	 {49--63},
  year = 	 {2014},
  editor = 	 {Clark, Alexander and Kanazawa, Makoto and Yoshinaka, Ryo},
  volume = 	 {34},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Kyoto, Japan},
  month = 	 {17--19 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v34/coste14a.pdf},
  url = 	 {https://proceedings.mlr.press/v34/coste14a.html},
  abstract = 	 {Based on Harris’s substitutability criterion, the recent definitions of classes of substitutable languages have led to interesting polynomial learnability results for expressive formal languages. These classes are also promising for practical applications: in natural language analysis, because definitions have strong linguisitic support, but also in biology for modeling protein families, as suggested in our previous study introducing the class of local substitutable languages. But turning recent theoretical advances into practice badly needs truly practical algorithms. We present here an efficient learning algorithm, motivated by intelligibility and parsing efficiency of the result, which directly reduces the positive sample into a small non-redundant canonical grammar of the target substitutable language.  Thanks to this new algorithm, we have been able to extend our experimentation to a complete protein dataset confirming that it is possible to learn grammars on proteins with high specificity and good sensitivity by a generalization based on local substitutability.}
}

Endnote

%0 Conference Paper
%T A bottom-up efficient algorithm learning substitutable languages from positive examples
%A François Coste
%A Gaëlle Garet
%A Jacques Nicolas
%B The 12th International Conference on Grammatical Inference
%C Proceedings of Machine Learning Research
%D 2014
%E Alexander Clark
%E Makoto Kanazawa
%E Ryo Yoshinaka	
%F pmlr-v34-coste14a
%I PMLR
%P 49--63
%U https://proceedings.mlr.press/v34/coste14a.html
%V 34
%X Based on Harris’s substitutability criterion, the recent definitions of classes of substitutable languages have led to interesting polynomial learnability results for expressive formal languages. These classes are also promising for practical applications: in natural language analysis, because definitions have strong linguisitic support, but also in biology for modeling protein families, as suggested in our previous study introducing the class of local substitutable languages. But turning recent theoretical advances into practice badly needs truly practical algorithms. We present here an efficient learning algorithm, motivated by intelligibility and parsing efficiency of the result, which directly reduces the positive sample into a small non-redundant canonical grammar of the target substitutable language.  Thanks to this new algorithm, we have been able to extend our experimentation to a complete protein dataset confirming that it is possible to learn grammars on proteins with high specificity and good sensitivity by a generalization based on local substitutability.

RIS


TY  - CPAPER
TI  - A bottom-up efficient algorithm learning substitutable languages from positive examples
AU  - François Coste
AU  - Gaëlle Garet
AU  - Jacques Nicolas
BT  - The 12th International Conference on Grammatical Inference
DA  - 2014/08/30
ED  - Alexander Clark
ED  - Makoto Kanazawa
ED  - Ryo Yoshinaka	
ID  - pmlr-v34-coste14a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 34
SP  - 49
EP  - 63
L1  - http://proceedings.mlr.press/v34/coste14a.pdf
UR  - https://proceedings.mlr.press/v34/coste14a.html
AB  - Based on Harris’s substitutability criterion, the recent definitions of classes of substitutable languages have led to interesting polynomial learnability results for expressive formal languages. These classes are also promising for practical applications: in natural language analysis, because definitions have strong linguisitic support, but also in biology for modeling protein families, as suggested in our previous study introducing the class of local substitutable languages. But turning recent theoretical advances into practice badly needs truly practical algorithms. We present here an efficient learning algorithm, motivated by intelligibility and parsing efficiency of the result, which directly reduces the positive sample into a small non-redundant canonical grammar of the target substitutable language.  Thanks to this new algorithm, we have been able to extend our experimentation to a complete protein dataset confirming that it is possible to learn grammars on proteins with high specificity and good sensitivity by a generalization based on local substitutability.
ER  -

APA


Coste, F., Garet, G. & Nicolas, J.. (2014). A bottom-up efficient algorithm learning substitutable languages from positive examples. The 12th International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 34:49-63 Available from https://proceedings.mlr.press/v34/coste14a.html.

Related Material

Download PDF