A New Multi-choice Reading Comprehension Dataset for Curriculum Learning

Yichan Liang, Jianheng Li, Jian Yin
Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:742-757, 2019.

Abstract

The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracy of the MCRC datasets, \textit{e.g.} RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by {{Lai et al.}} ({2017}) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. Statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, \textit{i.e.}, RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-liang19a, title = {A New Multi-choice Reading Comprehension Dataset for Curriculum Learning}, author = {Liang, Yichan and Li, Jianheng and Yin, Jian}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {742--757}, year = {2019}, editor = {Lee, Wee Sun and Suzuki, Taiji}, volume = {101}, series = {Proceedings of Machine Learning Research}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/liang19a/liang19a.pdf}, url = {https://proceedings.mlr.press/v101/liang19a.html}, abstract = {The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracy of the MCRC datasets, \textit{e.g.} RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by {{Lai et al.}} ({2017}) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. Statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, \textit{i.e.}, RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.} }
Endnote
%0 Conference Paper %T A New Multi-choice Reading Comprehension Dataset for Curriculum Learning %A Yichan Liang %A Jianheng Li %A Jian Yin %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-liang19a %I PMLR %P 742--757 %U https://proceedings.mlr.press/v101/liang19a.html %V 101 %X The past few years have witnessed the rapid development of machine reading comprehension (MRC), especially the challenging sub-task, multiple-choice reading comprehension (MCRC). And the release of large scale datasets promotes the research in this field. Yet previous methods have already achieved high accuracy of the MCRC datasets, \textit{e.g.} RACE. It’s necessary to propose a more difficult dataset which needs more reasoning and inference for evaluating the understanding capability of new methods. To respond to such demand, we present RACE-C, a new multi-choice reading comprehension dataset collected from college English examinations in China. And further we integrate it with RACE-M and RACE-H, collected by {{Lai et al.}} ({2017}) from middle and high school exams respectively, to extend RACE to be RACE++. Based on RACE++, we propose a three-stage curriculum learning framework, which is able to use the best of the characteristic that the difficulty level within these three sub-datasets is in ascending order. Statistics show the higher difficulty level of our collected dataset, RACE-C, compared to RACE’s two sub-datasets, \textit{i.e.}, RACE-M and RACE-H. And experimental results demonstrate that our proposed three-stage curriculum learning approach improves the performance of the machine reading comprehension model to an extent.
APA
Liang, Y., Li, J. & Yin, J.. (2019). A New Multi-choice Reading Comprehension Dataset for Curriculum Learning. Proceedings of The Eleventh Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 101:742-757 Available from https://proceedings.mlr.press/v101/liang19a.html.

Related Material