2003 Fiscal Year Final Research Report Summary
The Use of Internet Corpus in Natural Language Processing
Project/Area Number |
14580411
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Electro-Communications |
Principal Investigator |
FURUGORI Teiji Computer Science, The Univ. of Electro-comm., Faculty of Electro-Communications, Professor, 電気通信学部, 教授 (80114932)
|
Project Period (FY) |
2002 – 2003
|
Keywords | Natural Language Processing / Information Extraction / Internet Corpus / Automatic Summerization / Machine Translation / Structural Analysis / 複合語分析 |
Research Abstract |
Textual materials on the Internet, or Internet corpus, is a language resource important for and valuable in natural language processing. In this research, we have tried to it in the process of devising a method for analyzing compound words in Japanese, a writer's aid program for translating Japanese into English, and an automatic summarization system for newspaper articles on sassho-jiken. The approach we use in natural language processing is statistical, not linguistic theoretical. We encounter a difficulty in this approach that require a solution to the spares date problem : whatever the result we may get, it will not be reliable one if it is attained from the analysis of insufficient amount of data. The data on the Internet are practically infinite, and our research has proven an effective use of Internet corpus in the areas we dealt with. At the same time, however, Iit has revealed a problem that the data are not well formed on the Internet and a device to eliminate "junk" data would be a necessary process for many language processing systems.
|
Research Products
(13 results)