Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models

Valentin I. Spitkovsky; Hiyan Alshawi; Daniel Jurafsky

Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models

Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jurafsky

Proceedings of the Eleventh International Conference on Grammatical Inference, PMLR 21:189-194, 2012.

Abstract

Modern grammar induction systems often employ curriculum learning strategies that begin by training on a subset of all available input that is considered simpler than the full data. Traditionally, filtering has been at granularities of whole input units, e.g., discarding entire sentences with too many words or punctuation marks. We propose instead viewing inter-punctuation fragments as atoms, initially, thus making some simple phrases and clauses of complex sentences available to training sooner. Splitting input text at punctuation in this way improved our state-of-the-art grammar induction pipeline. We observe that resulting partial data, i.e., mostly incomplete sentence fragments, can be analyzed using reduced parsing models which, we show, can be easier to bootstrap than more nuanced grammars. Starting with a new, bare dependency-and-boundary model (DBM-0), our grammar inducer attained 61.2% directed dependency accuracy on Section 23 (all sentences) of the Wall Street Journal corpus: more than 2% higher than previous published results for this task.

Cite this Paper

BibTeX

@InProceedings{pmlr-v21-spitkovsky12a,
  title = 	 {Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models},
  author = 	 {Spitkovsky, Valentin I. and Alshawi, Hiyan and Jurafsky, Daniel},
  booktitle = 	 {Proceedings of the Eleventh International Conference on Grammatical Inference},
  pages = 	 {189--194},
  year = 	 {2012},
  editor = 	 {Heinz, Jeffrey and Higuera, Colin and Oates, Tim},
  volume = 	 {21},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {University of Maryland, College Park, MD, USA},
  month = 	 {05--08 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v21/spitkovsky12a/spitkovsky12a.pdf},
  url = 	 {https://proceedings.mlr.press/v21/spitkovsky12a.html},
  abstract = 	 {Modern grammar induction systems often employ curriculum learning strategies that begin by training on a subset of all available input that is considered simpler than the full data.  Traditionally, filtering has been at granularities of whole input units, e.g., discarding entire sentences with too many words or punctuation marks. We propose instead viewing inter-punctuation fragments as atoms, initially, thus making some simple phrases and clauses of complex sentences available to training sooner.  Splitting input text at punctuation in this way improved our state-of-the-art grammar induction pipeline.  We observe that resulting partial data, i.e., mostly incomplete sentence fragments, can be analyzed using reduced parsing models which, we show, can be easier to bootstrap than more nuanced grammars.  Starting with a new, bare dependency-and-boundary model (DBM-0), our grammar inducer attained 61.2% directed dependency accuracy on Section 23 (all sentences) of the Wall Street Journal corpus: more than 2% higher than previous published results for this task.}
}

Endnote

%0 Conference Paper
%T Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models
%A Valentin I. Spitkovsky
%A Hiyan Alshawi
%A Daniel Jurafsky
%B Proceedings of the Eleventh International Conference on Grammatical Inference
%C Proceedings of Machine Learning Research
%D 2012
%E Jeffrey Heinz
%E Colin Higuera
%E Tim Oates	
%F pmlr-v21-spitkovsky12a
%I PMLR
%P 189--194
%U https://proceedings.mlr.press/v21/spitkovsky12a.html
%V 21
%X Modern grammar induction systems often employ curriculum learning strategies that begin by training on a subset of all available input that is considered simpler than the full data.  Traditionally, filtering has been at granularities of whole input units, e.g., discarding entire sentences with too many words or punctuation marks. We propose instead viewing inter-punctuation fragments as atoms, initially, thus making some simple phrases and clauses of complex sentences available to training sooner.  Splitting input text at punctuation in this way improved our state-of-the-art grammar induction pipeline.  We observe that resulting partial data, i.e., mostly incomplete sentence fragments, can be analyzed using reduced parsing models which, we show, can be easier to bootstrap than more nuanced grammars.  Starting with a new, bare dependency-and-boundary model (DBM-0), our grammar inducer attained 61.2% directed dependency accuracy on Section 23 (all sentences) of the Wall Street Journal corpus: more than 2% higher than previous published results for this task.

RIS

TY  - CPAPER
TI  - Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models
AU  - Valentin I. Spitkovsky
AU  - Hiyan Alshawi
AU  - Daniel Jurafsky
BT  - Proceedings of the Eleventh International Conference on Grammatical Inference
DA  - 2012/08/16
ED  - Jeffrey Heinz
ED  - Colin Higuera
ED  - Tim Oates	
ID  - pmlr-v21-spitkovsky12a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 21
SP  - 189
EP  - 194
L1  - http://proceedings.mlr.press/v21/spitkovsky12a/spitkovsky12a.pdf
UR  - https://proceedings.mlr.press/v21/spitkovsky12a.html
AB  - Modern grammar induction systems often employ curriculum learning strategies that begin by training on a subset of all available input that is considered simpler than the full data.  Traditionally, filtering has been at granularities of whole input units, e.g., discarding entire sentences with too many words or punctuation marks. We propose instead viewing inter-punctuation fragments as atoms, initially, thus making some simple phrases and clauses of complex sentences available to training sooner.  Splitting input text at punctuation in this way improved our state-of-the-art grammar induction pipeline.  We observe that resulting partial data, i.e., mostly incomplete sentence fragments, can be analyzed using reduced parsing models which, we show, can be easier to bootstrap than more nuanced grammars.  Starting with a new, bare dependency-and-boundary model (DBM-0), our grammar inducer attained 61.2% directed dependency accuracy on Section 23 (all sentences) of the Wall Street Journal corpus: more than 2% higher than previous published results for this task.
ER  -

APA

Spitkovsky, V.I., Alshawi, H. & Jurafsky, D.. (2012). Bootstrapping Dependency Grammar Inducers from Incomplete Sentence Fragments via Austere Models. Proceedings of the Eleventh International Conference on Grammatical Inference, in Proceedings of Machine Learning Research 21:189-194 Available from https://proceedings.mlr.press/v21/spitkovsky12a.html.

Related Material

Download PDF