2003 Fiscal Year Final Research Report Summary
Analysis and Retrieval of Printed and Electronic Documents for Recycle of Information
Project/Area Number |
14580453
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報システム学(含情報図書館学)
|
Research Institution | Osaka Prefecture University |
Principal Investigator |
KISE Koichi Osaka Prefecture University, Graduate School of Engineering, Associate Professor, 工学研究科, 助教授 (80224939)
|
Project Period (FY) |
2002 – 2003
|
Keywords | information retrieval / recycle of information / document analysis / document image retrieval / information extraction / WWW / passage retrieval / embedding data |
Research Abstract |
Recycle of information is the process of reproducing useful information based on materials decomposed from compound information included in Web pages and documents. In this research we have investigated the recycle of information from both printed and electronic documents as well as "reuse'" of information as a previous step of recycling. The results of this research are summarized as follows. 1.Retrieval of parts of document images and its application to question answering : As a method of reuse of printed documents, we have proposed a method of retrieving parts of document images based on two-dimensional density distributions of keywords. Experimental results on various English and Japanese documents show the effectiveness of the proposed method. Some basic functions of question answering have also been implemented based on this retrieval method. Question answering is the process of locating possible answers on documents in response to questions written in natural languages, and thus
… More
is a kind of recycling information. Experimental results on English documents show that the method is capable of locating correct answers at the first rank for about half of questions. 2.Embedding and recovery of electronic data on printed documents : As another approach of reusing information on printed documents we have implemented a method of embedding text information on printed documents when they are printed. 4KB of data are successfully embedded and recovered on B5 pages while permitting 20% of reading errors. 3.Passage retrieval of electronic documents and its application to question answering : We have also proposed a method of retrieving parts of electronic documents based on the density distributions of keywords. It is applied to question answering as well and proven that the method is capable of locating correct answers at the top for half of questions. 4.Information extraction from electronic documents : As a way of recycling information from electronic documents, a method of tabulating information included in documents has been proposed. We have applied this method to 7000 Japanese news articles for extracting personal profile information and shown its effectiveness. In addition, we evaluated basic methods for information extraction both from web pages and with images. Less
|
Research Products
(38 results)