High Compression-Rate Automatic Summarization of Newspaper Articles Based on Combined Use of Significant Sentence Extraction and Sentence Compression
Project/Area Number |
16500077
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Electro-Communications |
Principal Investigator |
OZEKI Kazuhiko The University of Electro-Communications, Faculty of Electro-Communications, Professor, 電気通信学部, 教授 (50214135)
|
Co-Investigator(Kenkyū-buntansha) |
TAKAGI Kazuyuki The University of Electro-Communications, Faculty of Electro-Communications, Research Associate, 電気通信学部, 助手 (70272755)
|
Project Period (FY) |
2004 – 2006
|
Project Status |
Completed (Fiscal Year 2006)
|
Budget Amount *help |
¥3,000,000 (Direct Cost: ¥3,000,000)
Fiscal Year 2006: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 2005: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 2004: ¥1,600,000 (Direct Cost: ¥1,600,000)
|
Keywords | text summarization / sentence compression / phrase significance / inter-phrase dependency / phrase alignment / dependency path length / information retention / grammatical naturalness / 係り受け / 概念距離 |
Research Abstract |
1.In this work, we use a corpus in which pairs of newspaper articles and corresponding hand-made short summaries are contained. This corpus provides information about how humans make short summaries. To obtain such information effectively, phrase alignment is necessary between the original sentence and its summary. We developed a phrase aligner that makes use of conceptual distance and inter-phrase dependency. 2.Before the research period started, we were using the inter-phrase dependency strength estimated from the distribution of dependency distance in the set of original sentences. This method misses, however, the relationship between the original sentence and its summary. In this work, we estimated the inter-phrase dependency strength from the relative frequency of phrase pairs that exist in the original sentence with a certain dependency path length and remain having modifier-modified relation in the corresponding summary. The result of a subjective evaluation experiment showed sig
… More
nificant improvement in the quality of compressed sentences. 3.In the phrase extraction type sentence compression, which is employed in this research, phrases that are not in modifier-modified relation in the original sentence sometimes appear to have modifier-modified relation in the compressed sentence. Such a phenomenon may degrade the readability of compressed sentences. We worked out a method to modify the phrase ending of the modifier-phrase for improving the readability of compressed sentences. The result of subjective evaluation experiment showed the effectiveness of the method. 4.We reformulated our sentence compression method in a probabilistic framework. In calculating the probability that a compressed sentence is generated from an original sentence, quantities similar to phrase significance and inter-phrase dependency appear, which can be estimated from a training corpus. It was shown that this probabilistic approach attains comparable performance as our former, heuristic approach. Less
|
Report
(4 results)
Research Products
(19 results)