Building a Japanese Parsed Corpus
Project/Area Number |
07558046
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 試験 |
Research Field |
Intelligent informatics
|
Research Institution | KYOTO UNIVERSITY |
Principal Investigator |
NAGAO Makoto Kyoto University, Department of Electronics and Communication, Professor, 工学研究科, 教授 (30025960)
|
Co-Investigator(Kenkyū-buntansha) |
TSUNODA Tatsuhiko Kyoto University, Department of Electronics and Communication, Instructor, 工学研究科, 助手 (10273468)
MARUYAMA Hiroshi IBM Japan, Ltd., Tokyo Research Laboratory, Researcher, 東京基礎研究所, 研究員
KUROHASHI Sadao Kyoto University, Department of Electronics and Communication, Instructor, 工学研究科, 助手 (50263108)
|
Project Period (FY) |
1995 – 1996
|
Project Status |
Completed (Fiscal Year 1996)
|
Budget Amount *help |
¥6,300,000 (Direct Cost: ¥6,300,000)
Fiscal Year 1996: ¥2,200,000 (Direct Cost: ¥2,200,000)
Fiscal Year 1995: ¥4,100,000 (Direct Cost: ¥4,100,000)
|
Keywords | Natural Language Processing / Text Corpus / Morphological Analysis / Parsing |
Research Abstract |
The goal of the project was to construct a Japnese parsed corpus and to simultaneously improve a morphological analyzer and a parser. In the period of two years' project, we have achieved the following results : (a) We enhanced our morphological analyzer JUMAN to handle a word string as a whole, and to find and enter problematic fixed expressions which were analyzed incorrectly by the normal morphological analysis. We released the enhanced version of JUMAN,JUMAN3.0 in October 1996. (b) We enhanced the treatment of coordination structures and subordinate structures in our parser, KNP.KNP was also enhanced to handle several types of phrases with exceptional sentential functions. We released the enhanced version of KNP,KNP2.0 In March 1997. (c) We made a mouse-based interface to help and to speed up the human correction of tags assigned by JUMAN and KNP.The interface also provides the retrieval functions for the corpus. (d) As of March 1997, we have constructed about 20,000 sentences of parsed and manually-corrected corpus. Out of them, we opened about 10,000 sentences in March 1997.
|
Report
(3 results)
Research Products
(13 results)