Project/Area Number |
09480063
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Yokohama National University |
Principal Investigator |
NAKAGAWA Hiroshi Yokohama National University, Faculty of Engineering, Professor, 工学部, 教授 (20134893)
|
Co-Investigator(Kenkyū-buntansha) |
MORI Tatsunori Yokohama National University, Faculty of Engineering, Associate Professor, 工学部, 助教授 (70212264)
|
Project Period (FY) |
1997 – 1998
|
Project Status |
Completed (Fiscal Year 1998)
|
Budget Amount *help |
¥1,800,000 (Direct Cost: ¥1,800,000)
Fiscal Year 1998: ¥1,800,000 (Direct Cost: ¥1,800,000)
|
Keywords | Term Extraction / Information Extraction / Hyper Text / Information Retrieval / Manual / Natural Language Processing / 索引語抽出 / タ-ム抽出 |
Research Abstract |
A technical manual which is easy for a reader to find the part that the reader wants to know is desperately needed in this electronics age. This is not accomplished solely by the mere electronic manual. We still need many technical advancements for this purpose. In this research, we work out four essential technologies for our purpose. (1)Term Extraction : Technical terms are the most important entry points to read and under-stand the contents of manual. So far term extraction has done manually. Thus we aim at an automatic term extraction system. Our term extraction system is based on the structure of compound nouns and the moving window method which are brand new ideas. The Performance of the proposed system is superior to already proposed methods. (2)Numerical Information Extraction : Numerical information which is essential in technical document is automatically extracted by the language pattern matching method which we developed in this research. (3)Compound Word based Information Retrieval : An information retrieval engine which utilizes the nature of compound words is necessary, because in technical manual, the majority terms are compound words. Our retrieval engine outperforms the traditional tf*idf based method in the test collection BMIR-JI. (4)Automatic Hypertextization : Electronic manual in which relevant parts are linked is extremely useful for novices. We developed the system which find and set links that connect two parts having similar contents. Similarity is calculated based on word co-occurrences and word chain technologies.
|