|Budget Amount *help
¥3,500,000 (Direct Cost : ¥3,500,000)
Fiscal Year 2003 : ¥1,200,000 (Direct Cost : ¥1,200,000)
Fiscal Year 2002 : ¥1,100,000 (Direct Cost : ¥1,100,000)
Fiscal Year 2001 : ¥1,200,000 (Direct Cost : ¥1,200,000)
In this project, we put the focus on the following two themes :
1.sentence reduction for summarization, 2. multiple document summarization, and the following results were obtained.
As for the sentence reduction, we proposed a method for deleting adnominal verb phrases. This method is based on the observation that if the kinds of verbs which modify the noun modified by the verb is limited, then the adnominal verb phrase can be easily associated with by the noun and maybe deleted. Such diversity of modifying verbs is measured by entropy. We also proposed a method of deleting multiple adnominal phrases. The degree of deletability of an adnominal phrase is estimated by the importance of the noun in the phrase and mutual information.
We developed a multiple document summarization system GOLD. Previous experiences show that a good automatic summarization system can be developed by combining appropreately a number of summarization techniques. Thus we developed GOLD by combining a variety of summarization techniques both conventional and newly introduced. The evalution results at TSC 2 of NTCIR 3 was satisfactory.
In a multiple document summarization, the document set to be summarized usually has multiple topics. However, a user may not necessarily to be interested in all topics. Thus, a user customized summary is needed. To cope with this need, we developed a multiple document summarization system with user interaction. The system suggests keywords extracted from the document set to be summarized and the user choose appropreate keywords among them. The evalution results at TSC 3 of NTCIR 4 was remarkable, in particular, in content evaluation.
As a related results, we proposed a method for acquiring knowledge from a single corpus on disabbreviations for Japanese nouns. This knowledge is useful e.g., for information retrieval, word sense disambiguation and summarization.