A New Multi-choice Reading Comprehension Dataset for Curriculum Learning

Yichan Liang; Jianheng Li; Jian Yin

A New Multi-choice Reading Comprehension Dataset for Curriculum Learning

Yichan Liang, Jianheng Li, Jian Yin

Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:742-757, 2019.

Abstract

The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracy of the MCRC datasets, \textit{e.g.} RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by {{Lai et al.}} ({2017}) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. Statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, \textit{i.e.}, RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.

Cite this Paper

BibTeX


@InProceedings{pmlr-v101-liang19a,
  title = 	 {A New Multi-choice Reading Comprehension Dataset for Curriculum Learning},
  author =       {Liang, Yichan and Li, Jianheng and Yin, Jian},
  booktitle = 	 {Proceedings of The Eleventh Asian Conference on Machine Learning},
  pages = 	 {742--757},
  year = 	 {2019},
  editor = 	 {Lee, Wee Sun and Suzuki, Taiji},
  volume = 	 {101},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v101/liang19a/liang19a.pdf},
  url = 	 {https://proceedings.mlr.press/v101/liang19a.html},
  abstract = 	 {The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracy of the MCRC datasets, \textit{e.g.} RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by {{Lai et al.}} ({2017}) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. Statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, \textit{i.e.}, RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.}
}

Endnote

%0 Conference Paper
%T A New Multi-choice Reading Comprehension Dataset for Curriculum Learning
%A Yichan Liang
%A Jianheng Li
%A Jian Yin
%B Proceedings of The Eleventh Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Wee Sun Lee
%E Taiji Suzuki	
%F pmlr-v101-liang19a
%I PMLR
%P 742--757
%U https://proceedings.mlr.press/v101/liang19a.html
%V 101
%X The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracy of the MCRC datasets, \textit{e.g.} RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by {{Lai et al.}} ({2017}) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. Statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, \textit{i.e.}, RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.

APA


Liang, Y., Li, J. & Yin, J.. (2019). A New Multi-choice Reading Comprehension Dataset for Curriculum Learning. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:742-757 Available from https://proceedings.mlr.press/v101/liang19a.html.

Related Material

Download PDF