2003 Fiscal Year Final Research Report Summary
Basic Studies on Automatic Text Summarization as an Aid for Human Intellectual Activities
Project/Area Number |
13680444
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Toyohashi University of Technology |
Principal Investigator |
MASUYAMA Shigeru Toyohashi University of Technology, Department of Knowledge-based Information Engineering, Professor, 工学部, 教授 (60173762)
|
Project Period (FY) |
2001 – 2003
|
Keywords | automatic text summarization / multiple document summarization / user interaction / disabbreviation / paraphrasing / information access technology / natural language processing |
Research Abstract |
In this project, we put the focus on the following two themes : 1.sentence reduction for summarization, 2. multiple document summarization, and the following results were obtained. As for the sentence reduction, we proposed a method for deleting adnominal verb phrases. This method is based on the observation that if the kinds of verbs which modify the noun modified by the verb is limited, then the adnominal verb phrase can be easily associated with by the noun and maybe deleted. Such diversity of modifying verbs is measured by entropy. We also proposed a method of deleting multiple adnominal phrases. The degree of deletability of an adnominal phrase is estimated by the importance of the noun in the phrase and mutual information. We developed a multiple document summarization system GOLD. Previous experiences show that a good automatic summarization system can be developed by combining appropreately a number of summarization techniques. Thus we developed GOLD by combining a variety of summarization techniques both conventional and newly introduced. The evalution results at TSC 2 of NTCIR 3 was satisfactory. In a multiple document summarization, the document set to be summarized usually has multiple topics. However, a user may not necessarily to be interested in all topics. Thus, a user customized summary is needed. To cope with this need, we developed a multiple document summarization system with user interaction. The system suggests keywords extracted from the document set to be summarized and the user choose appropreate keywords among them. The evalution results at TSC 3 of NTCIR 4 was remarkable, in particular, in content evaluation. As a related results, we proposed a method for acquiring knowledge from a single corpus on disabbreviations for Japanese nouns. This knowledge is useful e.g., for information retrieval, word sense disambiguation and summarization.
|