2020 Fiscal Year Final Research Report

Generative Summarization Based on Stepwise Extraction and Rewriting

Research Project

PDF

Project/Area Number	19K20339
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Tokyo Institute of Technology
Principal Investigator	Kamigaito Hidetaka 東京工業大学, 科学技術創成研究院, 助教 (40817649)
Project Period (FY)	2019-04-01 – 2021-03-31
Keywords	自動要約 / 文抽出 / 文圧縮 / 自然言語生成
Outline of Final Research Achievements	In order to achieve human-like stepwise summarization with sentence extraction, compression, and rewriting in existing document summarization methods based on neural networks, we have developed a robust sentence compressor that can work with the conventional document summarization method in various domains. Through the investigation of the sentence compressor, we found that pre-trained word vectors contribute to performance improvement. We also investigated the knowledge graph embedding, which is necessary when we enhance word vectors by external knowledge. We provided a theoretical background for selecting a suitable loss function to support the training for knowledge graph embedding. Eventually, we incorporated our sentence compressor into the conventional document summarization method. We observed a performance improvement of automatic evaluation in the sentence extraction summarization setting.
Free Research Field	自然言語処理
Academic Significance and Societal Importance of the Research Achievements	文書の自動要約はデジタル文書が増加するインターネット社会において、読者が情報の取捨選択を行う際に重要な技術であると考えられる。本研究では要約生成時の動作が隠蔽されている既存のニューラルネットワークに基づく文書要約手法とは異なり、実際に要約が生成される過程が明確であるため、獲得したい要約結果の調整が容易であるという点で有用である。また文圧縮過程において使用される単語情報に外部知識を反映可能であるため、既存の文書圧縮手法に比べより多くのドメインでの動作が期待できる。これはニュース記事のみならずブログ記事やレビュー投稿等も対象とすることが可能である点で適用範囲が広く有用である。