An Efficient Approach for Multi-Sentence Compression

Elaheh ShafieiBavani, Mohammad Ebrahimi, Raymond K. Wong, Fang Chen
Proceedings of The 8th Asian Conference on Machine Learning, PMLR 63:414-429, 2016.

Abstract

Multi Sentence Compression (MSC) is of great value to many real world applications, such as guided microblog summarization, opinion summarization and newswire summarization. Recently, word graph-based approaches have been proposed and become popular in MSC. Their key assumption is that redundancy among a set of related sentences provides a reliable way to generate informative and grammatical sentences. In this paper, we propose an effective approach to enhance the word graph-based MSC and tackle the issue that most of the state-of-the-art MSC approaches are confronted with: i.e., improving both informativity and grammaticality at the same time. Our approach consists of three main components: (1) a merging method based on Multiword Expressions (MWE); (2) a mapping strategy based on synonymy between words; (3) a re-ranking step to identify the best compression candidates generated using a POS-based language model (POS-LM). We demonstrate the effectiveness of this novel approach using a dataset made of clusters of English newswire sentences. The observed improvements on informativity and grammaticality of the generated compressions show an up to 44% error reduction over state-of-the-art MSC systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v63-ShafieiBavani24, title = {An Efficient Approach for Multi-Sentence Compression}, author = {ShafieiBavani, Elaheh and Ebrahimi, Mohammad and Wong, Raymond K. and Chen, Fang}, booktitle = {Proceedings of The 8th Asian Conference on Machine Learning}, pages = {414--429}, year = {2016}, editor = {Durrant, Robert J. and Kim, Kee-Eung}, volume = {63}, series = {Proceedings of Machine Learning Research}, address = {The University of Waikato, Hamilton, New Zealand}, month = {16--18 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v63/ShafieiBavani24.pdf}, url = {https://proceedings.mlr.press/v63/ShafieiBavani24.html}, abstract = {Multi Sentence Compression (MSC) is of great value to many real world applications, such as guided microblog summarization, opinion summarization and newswire summarization. Recently, word graph-based approaches have been proposed and become popular in MSC. Their key assumption is that redundancy among a set of related sentences provides a reliable way to generate informative and grammatical sentences. In this paper, we propose an effective approach to enhance the word graph-based MSC and tackle the issue that most of the state-of-the-art MSC approaches are confronted with: i.e., improving both informativity and grammaticality at the same time. Our approach consists of three main components: (1) a merging method based on Multiword Expressions (MWE); (2) a mapping strategy based on synonymy between words; (3) a re-ranking step to identify the best compression candidates generated using a POS-based language model (POS-LM). We demonstrate the effectiveness of this novel approach using a dataset made of clusters of English newswire sentences. The observed improvements on informativity and grammaticality of the generated compressions show an up to 44% error reduction over state-of-the-art MSC systems.} }
Endnote
%0 Conference Paper %T An Efficient Approach for Multi-Sentence Compression %A Elaheh ShafieiBavani %A Mohammad Ebrahimi %A Raymond K. Wong %A Fang Chen %B Proceedings of The 8th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Robert J. Durrant %E Kee-Eung Kim %F pmlr-v63-ShafieiBavani24 %I PMLR %P 414--429 %U https://proceedings.mlr.press/v63/ShafieiBavani24.html %V 63 %X Multi Sentence Compression (MSC) is of great value to many real world applications, such as guided microblog summarization, opinion summarization and newswire summarization. Recently, word graph-based approaches have been proposed and become popular in MSC. Their key assumption is that redundancy among a set of related sentences provides a reliable way to generate informative and grammatical sentences. In this paper, we propose an effective approach to enhance the word graph-based MSC and tackle the issue that most of the state-of-the-art MSC approaches are confronted with: i.e., improving both informativity and grammaticality at the same time. Our approach consists of three main components: (1) a merging method based on Multiword Expressions (MWE); (2) a mapping strategy based on synonymy between words; (3) a re-ranking step to identify the best compression candidates generated using a POS-based language model (POS-LM). We demonstrate the effectiveness of this novel approach using a dataset made of clusters of English newswire sentences. The observed improvements on informativity and grammaticality of the generated compressions show an up to 44% error reduction over state-of-the-art MSC systems.
RIS
TY - CPAPER TI - An Efficient Approach for Multi-Sentence Compression AU - Elaheh ShafieiBavani AU - Mohammad Ebrahimi AU - Raymond K. Wong AU - Fang Chen BT - Proceedings of The 8th Asian Conference on Machine Learning DA - 2016/11/20 ED - Robert J. Durrant ED - Kee-Eung Kim ID - pmlr-v63-ShafieiBavani24 PB - PMLR DP - Proceedings of Machine Learning Research VL - 63 SP - 414 EP - 429 L1 - http://proceedings.mlr.press/v63/ShafieiBavani24.pdf UR - https://proceedings.mlr.press/v63/ShafieiBavani24.html AB - Multi Sentence Compression (MSC) is of great value to many real world applications, such as guided microblog summarization, opinion summarization and newswire summarization. Recently, word graph-based approaches have been proposed and become popular in MSC. Their key assumption is that redundancy among a set of related sentences provides a reliable way to generate informative and grammatical sentences. In this paper, we propose an effective approach to enhance the word graph-based MSC and tackle the issue that most of the state-of-the-art MSC approaches are confronted with: i.e., improving both informativity and grammaticality at the same time. Our approach consists of three main components: (1) a merging method based on Multiword Expressions (MWE); (2) a mapping strategy based on synonymy between words; (3) a re-ranking step to identify the best compression candidates generated using a POS-based language model (POS-LM). We demonstrate the effectiveness of this novel approach using a dataset made of clusters of English newswire sentences. The observed improvements on informativity and grammaticality of the generated compressions show an up to 44% error reduction over state-of-the-art MSC systems. ER -
APA
ShafieiBavani, E., Ebrahimi, M., Wong, R.K. & Chen, F.. (2016). An Efficient Approach for Multi-Sentence Compression. Proceedings of The 8th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 63:414-429 Available from https://proceedings.mlr.press/v63/ShafieiBavani24.html.

Related Material