• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Studies on Corpus Creation and Use for Linguistic Research

Research Project

Project/Area Number 15300046
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionNara Institute of Science and Technology

Principal Investigator

MATSUMOTO Yuji  Nara Institute of Science and Technology, Graduate School of Information Science, professor, 情報科学研究科, 教授 (10211575)

Co-Investigator(Kenkyū-buntansha) ASAHARA Masayuki  Nara Institute of Science and Technology, Graduate School of Information Science, Assistant professor, 情報科学研究科, 助手 (80379528)
HASHIMOTO Kiyota  Osaka Prefectural University, School of Humanities & Social Sciences, associate professor, 人間社会学部, 助教授 (50278818)
TONO Yukio  Meikai University, Faculty of Languages, professor, 外国語学部, 教授 (10211393)
OHTANI Akira  Osaka Gakuin University, Faculty of Informatics, Lecturer, 情報学部, 講師 (50283817)
乾 健太郎  奈良先端科学技術大学院大学, 情報科学研究科, 助教授 (60272689)
Project Period (FY) 2003 – 2005
Project Status Completed (Fiscal Year 2005)
Budget Amount *help
¥14,500,000 (Direct Cost: ¥14,500,000)
Fiscal Year 2005: ¥5,300,000 (Direct Cost: ¥5,300,000)
Fiscal Year 2004: ¥4,600,000 (Direct Cost: ¥4,600,000)
Fiscal Year 2003: ¥4,600,000 (Direct Cost: ¥4,600,000)
Keywordscorpus / natural language processing / part-of-speech taggin / dependency analysis / database / retrieval / multi-lingual processing / KWIC / 言語コーパス / 言語処理 / 単語検索 / 文字列検索 / タグ付きコーパス
Research Abstract

As for the research for language processing, we augmented the language analysis tools we have been developing, such as Japanese morphological analyzer and Japanese dependency analyzer, for Chinese analysis.
As for development of dictionaries, we implemented unknown word analysis system for Chinese, and extracted candidates of new word entries by running the system on a large scale Chinese corpus. Through this experiment, we could successfully construct a large scale Chinese dictionary with about a hundred thousand word entries. For Japanese, we described the constituent word information of Japanese compound words and registered these information in the dictionary. For English, we developed a method for distinguishing literal and idiomatic uses of English multi-word expressions, and showed a high accuracy in distinguishing them.
As for the corpus tool development, we made a detailed design of the database schemes for annotated corpus and dictionary entries, and re-implemented the corpus management tool based on these schemes. We also implemented the error correction functions for part-of-speech and dependency analysis errors and designed and implemented the interface for the functions. The visualization function for showing phrasal chunks and their dependency relation, on which one of the error correction functions is realized.
The developed corpus management tools are made open to public and we hold two seminars to make it open and to explain the usage to those interested in using the system, aiming at collecting the feedback from the users. We also opened a Web page for introducing and downloading the tools.

Report

(4 results)
  • 2005 Annual Research Report   Final Research Report Summary
  • 2004 Annual Research Report
  • 2003 Annual Research Report
  • Research Products

    (28 results)

All 2005 2004 Other

All Journal Article (23 results) Publications (5 results)

  • [Journal Article] 相対的な係りやすさを考慮した日本語係り受け解析モデル2005

    • Author(s)
      工藤 拓, 松本 裕治
    • Journal Title

      情報処理学会論文誌 46・4

      Pages: 1082-1092

    • NAID

      110002911748

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Chinese Word Segmentation by Classification of Characters2005

    • Author(s)
      Chooi-Ling Goh, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      International Journal of Computational Linguistics and Chinese Language Processing 10・3

      Pages: 381-396

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Annual Research Report 2005 Final Research Report Summary
  • [Journal Article] 単語レベルと文字レベルの情報を用いた中国語・日本語単語分割2005

    • Author(s)
      中川 哲治, 松本 裕治
    • Journal Title

      情報処理学会論文誌 46・11

      Pages: 2714-2727

    • NAID

      110002911747

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] ChaKi : An Annotated Corpora Management and Search System2005

    • Author(s)
      Yuji Matsumoto, Masayuki Asahara, et al..
    • Journal Title

      Proceedings from the Corpus Linguistics COnference Series 1・1

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Automatic Extraction of Fixed Multiword Expressions2005

    • Author(s)
      Compbell Hore, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Natural Language Processing. Second International Joint Conference, Lecture Notes in Artifical Intelligence 3651

      Pages: 565-575

    • NAID

      110002949453

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Chinese Deterministic Dependency Analyzer : Examining Effects of Global Features and Root Node Finder2005

    • Author(s)
      Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Fourth SIGHAN Workshop on Chinese Language Processing. Proceedings of the Workshop 4

      Pages: 17-24

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Japanese Dependency Analysis Model with Relative Strength of Dependency (in Japanese)2005

    • Author(s)
      Taku Kudo, Yuji Matsumoto
    • Journal Title

      Transaction of Information Processing Society of Japan Vol.46, No.4

      Pages: 1082-1092

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Chinese Word Segmentation by Classification of Characters2005

    • Author(s)
      Chooi-Ling Goh, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      International Journal of Computational Linguistics and Chinese Language Processing Vol.10, No.3

      Pages: 381-396

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Chinese and Japanese Word Segmentation with Word Level and Character Level Information (in Japanese)2005

    • Author(s)
      Tetsuji Nakagawa, Yuji Matsumoto
    • Journal Title

      Transaction of Information Processing Society of Japan Vol.46, No.11

      Pages: 2714-2727

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] ChaKi : An Annotated Corpora Management and Search System2005

    • Author(s)
      Yuji Matsumoto, Masayuki Asahara, Yukio Tono, Akira Ohtani, Toshio Morita
    • Journal Title

      Proceedings from the Corpus Linguistics Conference Series Vol.1, No.1

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Automatic Extraction of Fixed Multiword Expressions2005

    • Author(s)
      Campbell Hore, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Natural Language Processing, Second International Joint Conference, Lecture Notes in Artificial Intelligence Vol.3651

      Pages: 565-575

    • NAID

      110002949453

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] Chinese Deterministic Dependency Analyzer : Examining Effects of Global Features and Root Node Finder2005

    • Author(s)
      Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Fourth SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop Vol.4

      Pages: 17-24

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2005 Final Research Report Summary
  • [Journal Article] 相対的な係りやすさを考慮した日本語係り受け解析モデル2005

    • Author(s)
      工藤拓, 松本裕治
    • Journal Title

      情報処理学会論文誌 46・4

      Pages: 1082-1092

    • NAID

      110002911748

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 単語レベルと文字レベルの情報を用いた中国語・日本語単語分割2005

    • Author(s)
      中川哲治, 松本裕治
    • Journal Title

      情報処理学会論文誌 46・11

      Pages: 2714-2727

    • NAID

      110002911747

    • Related Report
      2005 Annual Research Report
  • [Journal Article] ChaKi: An Annotated Corpora Management and Search System2005

    • Author(s)
      Yuji Matsumoto, Masayuki Asahara, Kou Kawabe, Yurika Takahashi, Yukio Tono, Akira Ohtani, Toshio Morita
    • Journal Title

      Proceedings from the Corpus Linguistics Conference Series 1・1

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Automatic Extraction of Fixed Multiword Expressions2005

    • Author(s)
      Campbell Hore, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Natural Language Processing, Second International Joint Conference, Lecture Notes in Artificial Intelligence 3651

      Pages: 565-575

    • NAID

      110002949453

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Chinese Deterministic Dependency Analyzer: Examining Effects of Global, Features and Root Node Finder2005

    • Author(s)
      Yuchang Cheng, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Fourth SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop 4

      Pages: 17-24

    • Related Report
      2005 Annual Research Report
  • [Journal Article] タグ付きコーパスの管理/検索ツール「茶器」の現状2005

    • Author(s)
      松本裕治, 浅原正幸, 河部恒, 高橋由梨加, 投野由紀夫, 大谷朗, 森田敏生
    • Journal Title

      言語処理学会年次大会 11

    • Related Report
      2004 Annual Research Report
  • [Journal Article] 日本語固有表現抽出におけるわかち書き問題の解決2004

    • Author(s)
      浅原正幸, 松本裕治
    • Journal Title

      情報処理学会論文誌 45・5

      Pages: 1442-1450

    • NAID

      110002712193

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Support Vector Machineを用いた決定性上昇型依存構造解析2004

    • Author(s)
      山田寛康, 松本裕治
    • Journal Title

      情報処理学会論文誌 45・10

      Pages: 2416-2427

    • NAID

      110002712084

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Japanese Unknown Word Identification by Character-based Chunking2004

    • Author(s)
      Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Proceedings of 20th International Conference on Computational Linguistics 20

      Pages: 459-465

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Pruning False Unknown Words to Improve Chinese Word Segmentation2004

    • Author(s)
      Chooi-Lirg Goh, Masayuki Asahara, Yuji Matsumoto
    • Journal Title

      Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation 18

      Pages: 139-149

    • NAID

      120006851427

    • Related Report
      2004 Annual Research Report
  • [Journal Article] 茶筌と南瓜による日本語解析-構文情報を用いた文の役割分類2004

    • Author(s)
      松本裕治, 高岡一馬, 浅原正幸, 工藤拓
    • Journal Title

      人工知能学会誌 19・3

      Pages: 334-339

    • Related Report
      2004 Annual Research Report
  • [Publications] 中川 哲治, 工藤 拓, 松本 裕治: "Support Vector Machineを用いた形態素解析と修正学習法の提案"情報処理学会論文誌. 44・5. 1354-1367 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Masayuki Asahara, Yuji Matsumoto: "Filler and disfluency identification based on morphological analysis and chunking"Proceedings of ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition. 163-166 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] Masayuki Asahara, Yuji Matsumoto: "Japanese named entity extraction with redundant morphological analysis"Proc.Human Language Technology and North American Chapter of Association for Computational Linguistics. 4. 8-15 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 工藤拓, 松本裕治: "部分木に基づくマルコフ確率場と言語解析への適用"情報処理学会研究報告,自然言語処理/情報学基礎. 157. 33-40 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 松本裕治, 他8名: "タグ付きコーパスの格納/検索ツール「茶器」"言語処理学会第10回年次大会論文集. 10. 405-408 (2004)

    • Related Report
      2003 Annual Research Report

URL: 

Published: 2003-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi