2006 Fiscal Year Final Research Report Summary
High Compression-Rate Automatic Summarization of Newspaper Articles Based on Combined Use of Significant Sentence Extraction and Sentence Compression
Project/Area Number |
16500077
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Electro-Communications |
Principal Investigator |
OZEKI Kazuhiko The University of Electro-Communications, Faculty of Electro-Communications, Professor, 電気通信学部, 教授 (50214135)
|
Co-Investigator(Kenkyū-buntansha) |
TAKAGI Kazuyuki The University of Electro-Communications, Faculty of Electro-Communications, Research Associate, 電気通信学部, 助手 (70272755)
|
Project Period (FY) |
2004 – 2006
|
Keywords | text summarization / sentence compression / phrase significance / inter-phrase dependency / phrase alignment / dependency path length / information retention / grammatical naturalness |
Research Abstract |
1.In this work, we use a corpus in which pairs of newspaper articles and corresponding hand-made short summaries are contained. This corpus provides information about how humans make short summaries. To obtain such information effectively, phrase alignment is necessary between the original sentence and its summary. We developed a phrase aligner that makes use of conceptual distance and inter-phrase dependency. 2.Before the research period started, we were using the inter-phrase dependency strength estimated from the distribution of dependency distance in the set of original sentences. This method misses, however, the relationship between the original sentence and its summary. In this work, we estimated the inter-phrase dependency strength from the relative frequency of phrase pairs that exist in the original sentence with a certain dependency path length and remain having modifier-modified relation in the corresponding summary. The result of a subjective evaluation experiment showed sig
… More
nificant improvement in the quality of compressed sentences. 3.In the phrase extraction type sentence compression, which is employed in this research, phrases that are not in modifier-modified relation in the original sentence sometimes appear to have modifier-modified relation in the compressed sentence. Such a phenomenon may degrade the readability of compressed sentences. We worked out a method to modify the phrase ending of the modifier-phrase for improving the readability of compressed sentences. The result of subjective evaluation experiment showed the effectiveness of the method. 4.We reformulated our sentence compression method in a probabilistic framework. In calculating the probability that a compressed sentence is generated from an original sentence, quantities similar to phrase significance and inter-phrase dependency appear, which can be estimated from a training corpus. It was shown that this probabilistic approach attains comparable performance as our former, heuristic approach. Less
|