2005 Fiscal Year Final Research Report Summary
Studies on Corpus Creation and Use for Linguistic Research
Project/Area Number |
15300046
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Nara Institute of Science and Technology |
Principal Investigator |
MATSUMOTO Yuji Nara Institute of Science and Technology, Graduate School of Information Science, professor, 情報科学研究科, 教授 (10211575)
|
Co-Investigator(Kenkyū-buntansha) |
ASAHARA Masayuki Nara Institute of Science and Technology, Graduate School of Information Science, Assistant professor, 情報科学研究科, 助手 (80379528)
HASHIMOTO Kiyota Osaka Prefectural University, School of Humanities & Social Sciences, associate professor, 人間社会学部, 助教授 (50278818)
TONO Yukio Meikai University, Faculty of Languages, professor, 外国語学部, 教授 (10211393)
OHTANI Akira Osaka Gakuin University, Faculty of Informatics, Lecturer, 情報学部, 講師 (50283817)
|
Project Period (FY) |
2003 – 2005
|
Keywords | corpus / natural language processing / part-of-speech taggin / dependency analysis / database / retrieval / multi-lingual processing / KWIC |
Research Abstract |
As for the research for language processing, we augmented the language analysis tools we have been developing, such as Japanese morphological analyzer and Japanese dependency analyzer, for Chinese analysis. As for development of dictionaries, we implemented unknown word analysis system for Chinese, and extracted candidates of new word entries by running the system on a large scale Chinese corpus. Through this experiment, we could successfully construct a large scale Chinese dictionary with about a hundred thousand word entries. For Japanese, we described the constituent word information of Japanese compound words and registered these information in the dictionary. For English, we developed a method for distinguishing literal and idiomatic uses of English multi-word expressions, and showed a high accuracy in distinguishing them. As for the corpus tool development, we made a detailed design of the database schemes for annotated corpus and dictionary entries, and re-implemented the corpus management tool based on these schemes. We also implemented the error correction functions for part-of-speech and dependency analysis errors and designed and implemented the interface for the functions. The visualization function for showing phrasal chunks and their dependency relation, on which one of the error correction functions is realized. The developed corpus management tools are made open to public and we hold two seminars to make it open and to explain the usage to those interested in using the system, aiming at collecting the feedback from the users. We also opened a Web page for introducing and downloading the tools.
|
Research Products
(12 results)