2005 Fiscal Year Final Research Report Summary

Studies on Corpus Creation and Use for Linguistic Research

Research Project

Project/Area Number	15300046
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Nara Institute of Science and Technology
Principal Investigator	MATSUMOTO Yuji Nara Institute of Science and Technology, Graduate School of Information Science, professor, 情報科学研究科, 教授 (10211575)
Co-Investigator(Kenkyū-buntansha)	ASAHARA Masayuki Nara Institute of Science and Technology, Graduate School of Information Science, Assistant professor, 情報科学研究科, 助手 (80379528) HASHIMOTO Kiyota Osaka Prefectural University, School of Humanities & Social Sciences, associate professor, 人間社会学部, 助教授 (50278818) TONO Yukio Meikai University, Faculty of Languages, professor, 外国語学部, 教授 (10211393) OHTANI Akira Osaka Gakuin University, Faculty of Informatics, Lecturer, 情報学部, 講師 (50283817)
Project Period (FY)	2003 – 2005
Keywords	corpus / natural language processing / part-of-speech taggin / dependency analysis / database / retrieval / multi-lingual processing / KWIC
Research Abstract	As for the research for language processing, we augmented the language analysis tools we have been developing, such as Japanese morphological analyzer and Japanese dependency analyzer, for Chinese analysis. As for development of dictionaries, we implemented unknown word analysis system for Chinese, and extracted candidates of new word entries by running the system on a large scale Chinese corpus. Through this experiment, we could successfully construct a large scale Chinese dictionary with about a hundred thousand word entries. For Japanese, we described the constituent word information of Japanese compound words and registered these information in the dictionary. For English, we developed a method for distinguishing literal and idiomatic uses of English multi-word expressions, and showed a high accuracy in distinguishing them. As for the corpus tool development, we made a detailed design of the database schemes for annotated corpus and dictionary entries, and re-implemented the corpus management tool based on these schemes. We also implemented the error correction functions for part-of-speech and dependency analysis errors and designed and implemented the interface for the functions. The visualization function for showing phrasal chunks and their dependency relation, on which one of the error correction functions is realized. The developed corpus management tools are made open to public and we hold two seminars to make it open and to explain the usage to those interested in using the system, aiming at collecting the feedback from the users. We also opened a Web page for introducing and downloading the tools.

Research Products
(12 results)

All 2005

All Journal Article (12 results)

[Journal Article] 相対的な係りやすさを考慮した日本語係り受け解析モデル2005
- Author(s)
  工藤拓, 松本裕治
- Journal Title
  
  情報処理学会論文誌 46・4
  
  Pages: 1082-1092
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Chinese Word Segmentation by Classification of Characters2005
- Author(s)
  Chooi-Ling Goh, Masayuki Asahara, Yuji Matsumoto
- Journal Title
  
  International Journal of Computational Linguistics and Chinese Language Processing 10・3
  
  Pages: 381-396
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 単語レベルと文字レベルの情報を用いた中国語・日本語単語分割2005
- Author(s)
  中川哲治, 松本裕治
- Journal Title
  
  情報処理学会論文誌 46・11
  
  Pages: 2714-2727
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] ChaKi : An Annotated Corpora Management and Search System2005
- Author(s)
  Yuji Matsumoto, Masayuki Asahara, et al..
- Journal Title
  
  Proceedings from the Corpus Linguistics COnference Series 1・1
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Automatic Extraction of Fixed Multiword Expressions2005
- Author(s)
  Compbell Hore, Masayuki Asahara, Yuji Matsumoto
- Journal Title
  
  Natural Language Processing. Second International Joint Conference, Lecture Notes in Artifical Intelligence 3651
  
  Pages: 565-575
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Chinese Deterministic Dependency Analyzer : Examining Effects of Global Features and Root Node Finder2005
- Author(s)
  Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto
- Journal Title
  
  Fourth SIGHAN Workshop on Chinese Language Processing. Proceedings of the Workshop 4
  
  Pages: 17-24
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Japanese Dependency Analysis Model with Relative Strength of Dependency (in Japanese)2005
- Author(s)
  Taku Kudo, Yuji Matsumoto
- Journal Title
  
  Transaction of Information Processing Society of Japan Vol.46, No.4
  
  Pages: 1082-1092
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Chinese Word Segmentation by Classification of Characters2005
- Author(s)
  Chooi-Ling Goh, Masayuki Asahara, Yuji Matsumoto
- Journal Title
  
  International Journal of Computational Linguistics and Chinese Language Processing Vol.10, No.3
  
  Pages: 381-396
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Chinese and Japanese Word Segmentation with Word Level and Character Level Information (in Japanese)2005
- Author(s)
  Tetsuji Nakagawa, Yuji Matsumoto
- Journal Title
  
  Transaction of Information Processing Society of Japan Vol.46, No.11
  
  Pages: 2714-2727
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] ChaKi : An Annotated Corpora Management and Search System2005
- Author(s)
  Yuji Matsumoto, Masayuki Asahara, Yukio Tono, Akira Ohtani, Toshio Morita
- Journal Title
  
  Proceedings from the Corpus Linguistics Conference Series Vol.1, No.1
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Automatic Extraction of Fixed Multiword Expressions2005
- Author(s)
  Campbell Hore, Masayuki Asahara, Yuji Matsumoto
- Journal Title
  
  Natural Language Processing, Second International Joint Conference, Lecture Notes in Artificial Intelligence Vol.3651
  
  Pages: 565-575
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Chinese Deterministic Dependency Analyzer : Examining Effects of Global Features and Root Node Finder2005
- Author(s)
  Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto
- Journal Title
  
  Fourth SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop Vol.4
  
  Pages: 17-24
- Description
  「研究成果報告書概要(欧文)」より

2005 Fiscal Year Final Research Report Summary

Studies on Corpus Creation and Use for Linguistic Research

Principal Investigator

MATSUMOTO Yuji Nara Institute of Science and Technology, Graduate School of Information Science, professor, 情報科学研究科, 教授 (10211575)

Research Products

[Journal Article] 相対的な係りやすさを考慮した日本語係り受け解析モデル2005

Author(s)

Journal Title

Description

[Journal Article] Chinese Word Segmentation by Classification of Characters2005

Author(s)

Journal Title

Description

[Journal Article] 単語レベルと文字レベルの情報を用いた中国語・日本語単語分割2005

Author(s)

Journal Title

Description

[Journal Article] ChaKi : An Annotated Corpora Management and Search System2005

Author(s)

Journal Title

Description

[Journal Article] Automatic Extraction of Fixed Multiword Expressions2005

Author(s)

Journal Title

Description

[Journal Article] Chinese Deterministic Dependency Analyzer : Examining Effects of Global Features and Root Node Finder2005

Author(s)

Journal Title

Description

[Journal Article] Japanese Dependency Analysis Model with Relative Strength of Dependency (in Japanese)2005

Author(s)

Journal Title

Description

[Journal Article] Chinese Word Segmentation by Classification of Characters2005

Author(s)

Journal Title

Description

[Journal Article] Chinese and Japanese Word Segmentation with Word Level and Character Level Information (in Japanese)2005

Author(s)

Journal Title

Description

[Journal Article] ChaKi : An Annotated Corpora Management and Search System2005

Author(s)

Journal Title

Description

[Journal Article] Automatic Extraction of Fixed Multiword Expressions2005

Author(s)

Journal Title

Description

[Journal Article] Chinese Deterministic Dependency Analyzer : Examining Effects of Global Features and Root Node Finder2005

Author(s)

Journal Title

Description