Project/Area Number |
15300046
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Nara Institute of Science and Technology |
Principal Investigator |
MATSUMOTO Yuji Nara Institute of Science and Technology, Graduate School of Information Science, professor, 情報科学研究科, 教授 (10211575)
|
Co-Investigator(Kenkyū-buntansha) |
ASAHARA Masayuki Nara Institute of Science and Technology, Graduate School of Information Science, Assistant professor, 情報科学研究科, 助手 (80379528)
HASHIMOTO Kiyota Osaka Prefectural University, School of Humanities & Social Sciences, associate professor, 人間社会学部, 助教授 (50278818)
TONO Yukio Meikai University, Faculty of Languages, professor, 外国語学部, 教授 (10211393)
OHTANI Akira Osaka Gakuin University, Faculty of Informatics, Lecturer, 情報学部, 講師 (50283817)
乾 健太郎 奈良先端科学技術大学院大学, 情報科学研究科, 助教授 (60272689)
|
Project Period (FY) |
2003 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥14,500,000 (Direct Cost: ¥14,500,000)
Fiscal Year 2005: ¥5,300,000 (Direct Cost: ¥5,300,000)
Fiscal Year 2004: ¥4,600,000 (Direct Cost: ¥4,600,000)
Fiscal Year 2003: ¥4,600,000 (Direct Cost: ¥4,600,000)
|
Keywords | corpus / natural language processing / part-of-speech taggin / dependency analysis / database / retrieval / multi-lingual processing / KWIC / 言語コーパス / 言語処理 / 単語検索 / 文字列検索 / タグ付きコーパス |
Research Abstract |
As for the research for language processing, we augmented the language analysis tools we have been developing, such as Japanese morphological analyzer and Japanese dependency analyzer, for Chinese analysis. As for development of dictionaries, we implemented unknown word analysis system for Chinese, and extracted candidates of new word entries by running the system on a large scale Chinese corpus. Through this experiment, we could successfully construct a large scale Chinese dictionary with about a hundred thousand word entries. For Japanese, we described the constituent word information of Japanese compound words and registered these information in the dictionary. For English, we developed a method for distinguishing literal and idiomatic uses of English multi-word expressions, and showed a high accuracy in distinguishing them. As for the corpus tool development, we made a detailed design of the database schemes for annotated corpus and dictionary entries, and re-implemented the corpus management tool based on these schemes. We also implemented the error correction functions for part-of-speech and dependency analysis errors and designed and implemented the interface for the functions. The visualization function for showing phrasal chunks and their dependency relation, on which one of the error correction functions is realized. The developed corpus management tools are made open to public and we hold two seminars to make it open and to explain the usage to those interested in using the system, aiming at collecting the feedback from the users. We also opened a Web page for introducing and downloading the tools.
|