Project/Area Number |
13680433
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
ISHIZUKA Mitsuru The University of Tokyo, Graduate School of Information Science and Technology, Professor, 大学院・情報理工学系研究科, 教授 (50114369)
|
Project Period (FY) |
2001 – 2003
|
Project Status |
Completed (Fiscal Year 2003)
|
Budget Amount *help |
¥4,100,000 (Direct Cost: ¥4,100,000)
Fiscal Year 2003: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2002: ¥1,300,000 (Direct Cost: ¥1,300,000)
Fiscal Year 2001: ¥1,400,000 (Direct Cost: ¥1,400,000)
|
Keywords | Natural Language Expression / Knowledge Representation / Inference / Key word extraction / Important Sentence Extraction / Knowledge Sharing / WWW知能化 / 知識共有・統合 / 知識マネージメント |
Research Abstract |
A huge quantity of electric documents mostly consisting of natural language texts are stored and circulated recently through the World Wide Web (WWW), etc. in the society. Although they include a variety of useful knowledge, it has been difficult for us and computers to fully utilize them as knowledge, such as deriving an answer by combining multiple pieces of knowledge. Toward one solution to this problem, we have proposed and developed a knowledge representation/inference scheme called "Concept Chemical Representation (CCR)", which representation is close to natural language expression and thus convenient for the conversion from natural language sentences. When converting natural language texts into CCR efficiently, it is required to focus meaningful sentences and ignore other parts for avoiding the inclusion of useless components into the CCR knowledge base. Accordingly, we have developed the following extraction methods of keywords and important sentences from a document. 1) keyword extraction using the deviation statistics of word co-occurrence. 2) keyword extraction using the small world structure of word co-occurrence. 3) keyword extraction based on term activities measured as the human cognitive process for term recognition. While most existing methods are based on the use of TF*IDF (term frequency * inverse document frequency), our above original methods are different from the existing ones. Our methods can be applied to English sentences as well as Japanese sentences.
|