1998 Fiscal Year Final Research Report Summary
Development of an Integrated Tool System for Technical Term Auto-Extraction and Knowledge Acquisition from Corpora
Project/Area Number |
08558027
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 展開研究 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
TSUJI Junichi The University of Tokyo, Graduate School of Science, Professor, 大学院・理学系研究科, 教授 (20026313)
|
Co-Investigator(Kenkyū-buntansha) |
IKEHARA Satoru The University of Tottori, Faculty of Engineering, Professor, 工学部, 教授 (70283968)
KAGEURA Kyo National Center for Science Information Systems, Associate Professor, 助教授 (00211152)
KOYAMA Teruo National Center for Science Information Systems, Professor, 教授 (80124410)
KIYONO Masaki Matsushita Electric Industrial company, Research institute of Tokyo, research worker, 東京研究所, 研究員
|
Project Period (FY) |
1996 – 1998
|
Keywords | knowledge acquisition / semantic classification / database / technical term extraction |
Research Abstract |
The goal of this project was to provide the systems that can acquire knowledge on terminology from texts in a semi-automatic manner. In order to accomplish the goal, we have developed the following three systems. 1. Central Database for Terminology : We have created a database system for terminology by integrating the text/lexicon database developed by EDR and the programming language LiLFeS, which was developed at University of Tokyo for easy and flexible treatment linguistic entities By this system, we can perform a systematic maintenance of the knowledge acquired by the following two systems. 2. Systems for term recognition : The research group in the NACSIS introduced a statistical metric to identify technical terminology in texts, and built the programs that can recognize terms using this metric. The group in University of Tokyo attacked the same problem in a different perspective, and succeeded in providing a term recognition method based on character n-grams. Those programs are integrated so that they can work as a front end of the database system described in 1. 3. Systems for acquiring ontological knowledge on terms : The research group in University of Tokyo developed the programs for obtaining semantic classifications of words according to surface clues appearing in texts. The Matsushita research group developed a similar technique using deeper syntactic structures of texts. Those systems were applied to the documents in Genome texts, the news articles about stock markets and so on.
|