Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector

Mingyang Song; Liping Jing; Yi Feng; Zhiwei Sun; Lin Xiao

Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector

Mingyang Song, Liping Jing, Yi Feng, Zhiwei Sun, Lin Xiao

Proceedings of The 13th Asian Conference on Machine Learning, PMLR 157:1739-1754, 2021.

Abstract

Text summarization has been a significant challenge in the Nature Process Language (NLP) field. The approach of dealing with text summarization can be roughly divided into two main paradigms: extractive and abstractive manner. The former allows capturing the most representative snippets in a document while the latter generates a summary by understanding the latent meaning in a material with a language generation model. Recently, studies found that jointly employing the extractive and abstractive summarization models can take advantage of their complementary advantages, creating both concise and informative summaries. However, the reinforced summarization models mainly depend on the ROUGE-based reward, which only has the ability to quantify the extent of word-matching rather than semantic-matching between document and summary. Meanwhile, documents are usually collected with redundant or noisy information due to the existence of repeated or irrelevant information in real-world applications. Therefore, only depending on ROUGE-based reward to optimize the reinforced summarization models may lead to biased summary generation. In this paper, we propose a novel deep \bf{Hy}brid \bf{S}ummarization with semantic weighting \bf{R}eward and latent structure \bf{D}etector (HySRD). Specifically, HySRD introduces a new reward mechanism that simultaneously takes advantage of semantic and syntactic information among documents and summaries. To effectively model the accuracy semantics, a latent structure detector is designed to incorporate the high-level latent structures in the sentence representation for information selection. Extensive experiments have been conducted on two well-known benchmark datasets \emph{CNN/Daily Mail} (short input document) and \emph{BigPatent} (long input document). The automatic evaluation shows that our approach significantly outperforms the state-of-the-art of hybrid summarization models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v157-song21a,
  title = 	 {Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector},
  author =       {Song, Mingyang and Jing, Liping and Feng, Yi and Sun, Zhiwei and Xiao, Lin},
  booktitle = 	 {Proceedings of The 13th Asian Conference on Machine Learning},
  pages = 	 {1739--1754},
  year = 	 {2021},
  editor = 	 {Balasubramanian, Vineeth N. and Tsang, Ivor},
  volume = 	 {157},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--19 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v157/song21a/song21a.pdf},
  url = 	 {https://proceedings.mlr.press/v157/song21a.html},
  abstract = 	 {Text summarization has been a significant challenge in the Nature Process Language (NLP) field. The approach of dealing with text summarization can be roughly divided into two main paradigms: extractive and abstractive manner. The former allows capturing the most representative snippets in a document while the latter generates a summary by understanding the latent meaning in a material with a language generation model. Recently, studies found that jointly employing the extractive and abstractive summarization models can take advantage of their complementary advantages, creating both concise and informative summaries. However, the reinforced summarization models mainly depend on the ROUGE-based reward, which only has the ability to quantify the extent of word-matching rather than semantic-matching between document and summary. Meanwhile, documents are usually collected with redundant or noisy information due to the existence of repeated or irrelevant information in real-world applications. Therefore, only depending on ROUGE-based reward to optimize the reinforced summarization models may lead to biased summary generation. In this paper, we propose a novel deep \bf{Hy}brid \bf{S}ummarization with semantic weighting \bf{R}eward and latent structure \bf{D}etector (HySRD). Specifically, HySRD introduces a new reward mechanism that simultaneously takes advantage of semantic and syntactic information among documents and summaries. To effectively model the accuracy semantics, a latent structure detector is designed to incorporate the high-level latent structures in the sentence representation for information selection. Extensive experiments have been conducted on two well-known benchmark datasets \emph{CNN/Daily Mail} (short input document) and \emph{BigPatent} (long input document). The automatic evaluation shows that our approach significantly outperforms the state-of-the-art of hybrid summarization models.}
}

Endnote

%0 Conference Paper
%T Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector
%A Mingyang Song
%A Liping Jing
%A Yi Feng
%A Zhiwei Sun
%A Lin Xiao
%B Proceedings of The 13th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Vineeth N. Balasubramanian
%E Ivor Tsang	
%F pmlr-v157-song21a
%I PMLR
%P 1739--1754
%U https://proceedings.mlr.press/v157/song21a.html
%V 157
%X Text summarization has been a significant challenge in the Nature Process Language (NLP) field. The approach of dealing with text summarization can be roughly divided into two main paradigms: extractive and abstractive manner. The former allows capturing the most representative snippets in a document while the latter generates a summary by understanding the latent meaning in a material with a language generation model. Recently, studies found that jointly employing the extractive and abstractive summarization models can take advantage of their complementary advantages, creating both concise and informative summaries. However, the reinforced summarization models mainly depend on the ROUGE-based reward, which only has the ability to quantify the extent of word-matching rather than semantic-matching between document and summary. Meanwhile, documents are usually collected with redundant or noisy information due to the existence of repeated or irrelevant information in real-world applications. Therefore, only depending on ROUGE-based reward to optimize the reinforced summarization models may lead to biased summary generation. In this paper, we propose a novel deep \bf{Hy}brid \bf{S}ummarization with semantic weighting \bf{R}eward and latent structure \bf{D}etector (HySRD). Specifically, HySRD introduces a new reward mechanism that simultaneously takes advantage of semantic and syntactic information among documents and summaries. To effectively model the accuracy semantics, a latent structure detector is designed to incorporate the high-level latent structures in the sentence representation for information selection. Extensive experiments have been conducted on two well-known benchmark datasets \emph{CNN/Daily Mail} (short input document) and \emph{BigPatent} (long input document). The automatic evaluation shows that our approach significantly outperforms the state-of-the-art of hybrid summarization models.

APA


Song, M., Jing, L., Feng, Y., Sun, Z. & Xiao, L.. (2021). Hybrid Summarization with Semantic Weighting Reward and Latent Structure Detector. Proceedings of The 13th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 157:1739-1754 Available from https://proceedings.mlr.press/v157/song21a.html.

Related Material

Download PDF