Generative Summarization Based on Stepwise Extraction and Rewriting

Research Project

Project/Area Number	19K20339
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Tokyo Institute of Technology
Principal Investigator	Kamigaito Hidetaka 東京工業大学, 科学技術創成研究院, 助教 (40817649)
Project Period (FY)	2019-04-01 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000) Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2019: ¥2,470,000 (Direct Cost: ¥1,900,000、Indirect Cost: ¥570,000)
Keywords	自動要約 / 文抽出 / 文圧縮 / 自然言語生成 / 文書要約 / 深層学習 / BERT / 注意機構 / 依存構造木 / 段階的な要約 / エンコーダ・デコーダ / アテンション
Outline of Research at the Start	本研究では、実際に段階的な要約の生成過程を考慮したニューラルネットワークに基づく生成型の文書要約器を実装し、長い文書を正しく要約することが可能となるかの確認を行う。また、生成過程では内部でどのような要約が仮定されているかの確認についても行う。そして、モデル内部で仮定された要約の生成過程と、人間が要約を行う際の手順を比較し、類似している点と異なっている点についての調査を行う。この調査により、ニューラルネットワーク上で生成的な要約を行う際に適切な手順やネットワークの構造についても明らかにする。
Outline of Final Research Achievements	In order to achieve human-like stepwise summarization with sentence extraction, compression, and rewriting in existing document summarization methods based on neural networks, we have developed a robust sentence compressor that can work with the conventional document summarization method in various domains. Through the investigation of the sentence compressor, we found that pre-trained word vectors contribute to performance improvement. We also investigated the knowledge graph embedding, which is necessary when we enhance word vectors by external knowledge. We provided a theoretical background for selecting a suitable loss function to support the training for knowledge graph embedding. Eventually, we incorporated our sentence compressor into the conventional document summarization method. We observed a performance improvement of automatic evaluation in the sentence extraction summarization setting.
Academic Significance and Societal Importance of the Research Achievements	文書の自動要約はデジタル文書が増加するインターネット社会において、読者が情報の取捨選択を行う際に重要な技術であると考えられる。本研究では要約生成時の動作が隠蔽されている既存のニューラルネットワークに基づく文書要約手法とは異なり、実際に要約が生成される過程が明確であるため、獲得したい要約結果の調整が容易であるという点で有用である。また文圧縮過程において使用される単語情報に外部知識を反映可能であるため、既存の文書圧縮手法に比べより多くのドメインでの動作が期待できる。これはニュース記事のみならずブログ記事やレビュー投稿等も対象とすることが可能である点で適用範囲が広く有用である。

Report

(3 results)

2020 Annual Research Report Final Research Report ( PDF )
2019 Research-status Report

Research Products
(4 results)

All 2021 2020

All Presentation (4 results) (of which Int'l Joint Research: 2 results)

[Presentation] 知識グラフ埋め込み学習における損失関数の統一的解釈2021
- Author(s)
  上垣外英剛,林克彦
- Organizer
  言語処理学会第27回年次大会(NLP2021)
- Related Report
  2020 Annual Research Report
[Presentation] Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With a Case Study for Knowledge Graph Embedding2021
- Author(s)
  Hidetaka Kamigaito and Katsuhiko Hayashi
- Organizer
  The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] 階層的な注意機構に基づき統語的な先読みを行う文圧縮2020
- Author(s)
  上垣外英剛, 奥村学
- Organizer
  情報処理学会第243回自然言語処理研究会
- Related Report
  2019 Research-status Report
[Presentation] Syntactically Look-Ahead Attention Network for Sentence Compression2020
- Author(s)
  Hidetaka Kamigaito, Manabu Okumura
- Organizer
  Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)
- Related Report
  2019 Research-status Report
- Int'l Joint Research

Generative Summarization Based on Stepwise Extraction and Rewriting

Principal Investigator

Kamigaito Hidetaka 東京工業大学, 科学技術創成研究院, 助教 (40817649)

¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000)

Report

Research Products

[Presentation] 知識グラフ埋め込み学習における損失関数の統一的解釈2021

Author(s)

Organizer

Related Report

[Presentation] Unified Interpretation of Softmax Cross-Entropy and Negative Sampling: With a Case Study for Knowledge Graph Embedding2021

Author(s)

Organizer

Related Report

[Presentation] 階層的な注意機構に基づき統語的な先読みを行う文圧縮2020

Author(s)

Organizer

Related Report

[Presentation] Syntactically Look-Ahead Attention Network for Sentence Compression2020

Author(s)

Organizer

Related Report