Study on Information Retrieval based on Similarity Calculation of Intra-Document Structure
Project/Area Number |
11680383
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Yokohama National University |
Principal Investigator |
MORI Tatsunori Yokohama National University, Faculty of Engineering, Associate, 工学部, 助教授 (70212264)
|
Co-Investigator(Kenkyū-buntansha) |
NAKAGAWA Hiroshi University of Tokyo, Information Technology Center, Professor, 情報基盤センター, 教授 (20134893)
|
Project Period (FY) |
1999 – 2000
|
Project Status |
Completed (Fiscal Year 2000)
|
Budget Amount *help |
¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2000: ¥1,100,000 (Direct Cost: ¥1,100,000)
Fiscal Year 1999: ¥2,500,000 (Direct Cost: ¥2,500,000)
|
Keywords | Retrieval of similar documents / Extraction of Numerical Expressions / Extraction of Named Entity / Question Answering / Information Retrieval / Information Extraction |
Research Abstract |
The purpose of this research is establishment of the method for "content"-based information retrieval. In our research, the "content" is regarded as the combination of the following items : a) Logical structure of document annotated by tags, b) Text, and c) Information extracted by the technology of Information Extraction. Through the two year research, we obtained the following results : 1. Extraction of structure of intra-documents based on similarity among passages : By using not only intra-document information but also inter-document information, we improve the effectiveness of retrieving relevant portions of document. 2. Multi-strategic named entity recognizer based on machine learning and extraction patterns : By combining those two types of strategies for named entity task, we improve the accuracy of recognition of named entities. 3. Extraction of numerical information and its application to Question Answering : We consider "Question Answering" is the one of the ideal context retrieval system. Named entities correspond to the answer for the 4W-type questions. On the other hand, it is numerical expressions what corresponds to H-type questions. Therefore, we proposed a method to extract numerical expressions with its context as a part of a QA system.
|
Report
(3 results)
Research Products
(15 results)