2019 Fiscal Year Final Research Report

Abatractive Generation of Paragraph Titles

Research Project

PDF

Project/Area Number	16K00441
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Library and information science/Humanistic social informatics
Research Institution	Okayama Prefectural University
Principal Investigator	KIKUI Genichiro 岡山県立大学, 情報工学部, 教授 (80395011)
Project Period (FY)	2016-04-01 – 2020-03-31
Keywords	自動タイトル付与 / 自動要約 / 自然言語処理 / 索引付け
Outline of Final Research Achievements	This research aims at developing models for generating a title for each paragraph of an English text. A paragraph title is a short linguistic expression which indicates or summarizes information of the given paragraph. A sequence of paragraph titles is useful representation of the text, expressing its argumentation line. In this work, we created a corpus of paragraph titles, composed by humans. We found that 46% of word tokens in a title do not appear in the corresponding paragraph in average, which means that we need ‘abstractive’ summarization. We, then, applied state-of-the-art title generation models, such as encode-decoder models and transformer models, to our corpus and found that two models produced relatively good performance at 34 rouge-1 score, but rated as ‘does not include main idea’ in average by human evaluators . This means that the corpus can provide a challenging task for abstractive title generation.
Free Research Field	自然言語処理
Academic Significance and Societal Importance of the Research Achievements	学術的意義は３点ある。１点目は論説文の議論の流れを簡潔に明示する手段としての段落タイトルに注目し、それらを３つに分類したことである。２点目は約120文章（総段落数786）の各段落に対して5つ以上の段落タイトルを付与したコーパスを構築し、その統計的性質や既存手法の限界などを明らかにしたことである。作成したコーパスは当該分野の研究に寄与できるものと思われる。３点目はタイトルの自動生成に必要な語義の扱い、特に、じでょ未登録語の意味を推定する手段を示したことである。社会的意義は氾濫するテキスト情報の閲覧を支援する手段として段落タイトルの位置づけとその性質を明らかにしたことである。