• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Research on Automatic Incremental Construction of Language Resources

Research Project

Project/Area Number 09308009
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionTokyo Institute of Technology

Principal Investigator

TANAKA Hozumi  Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Professor, 大学院・情報理工学研究科, 教授 (80163567)

Co-Investigator(Kenkyū-buntansha) SHIRAI Kiyoaki  Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Research Assistant, 大学院・情報理工学研究科, 助手 (30302970)
INUI Kentaro  Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Assistant Professor, 情報工学部, 助教授 (60272689)
TOKUNAGA Takenobu  Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Associate Professor, 大学院・情報理工学研究科, 助教授 (20197875)
Project Period (FY) 1997 – 1999
Project Status Completed (Fiscal Year 1999)
Budget Amount *help
¥22,700,000 (Direct Cost: ¥22,700,000)
Fiscal Year 1999: ¥2,600,000 (Direct Cost: ¥2,600,000)
Fiscal Year 1998: ¥7,200,000 (Direct Cost: ¥7,200,000)
Fiscal Year 1997: ¥12,900,000 (Direct Cost: ¥12,900,000)
Keywordslanguage resource / corpus / morphological analysis / syntactic analysis / probabilistic language model / automatic acquisition of knowledge / 言語知識ベース / 自然言語処理 / 注釈付きコーパス / 言語知識獲得 / MSLR構文解析法 / 確率一般化LRモデル / 確率GLR構文解析法 / 形態素接続表
Research Abstract

This research project is targeted at the automatic incremental construction of a corpus annotated with morphological information, such as word segmentation and part-of-speech (POS hereafter) tags, and syntactic information such as syntactic trees. An overview of the proposed method is as follows : We first analyse large volumes of text to obtain morphological and syntactic information to annotate the text with. Next we newly obtain knowledge for natural language analysis, i.e. we acquire a connection matrix, and train the probabilistic generarized LR language model (PGLR model hereafter). The connection matrix describes adjacency constraints between POS pairs. It can be aquired from POS tagged corpus automatically by way of regarding each POS pair as legally adjacent if it appears in sequence in the training corpus, and illegal otherwise. The PGLR language model provides the probabilistic language model, and is easily trainable given a tree-annotated corpus. Given these knowledge resources, we re-analyze the sentences, and newly obtain morphological and syntactic information. By repeating this procedure, we construct a corpus annotated with morphological and syntactic information automatically. Our experiment shows that the proposed method is effective for the enlargement of existing annotated corpora.

Report

(4 results)
  • 1999 Annual Research Report   Final Research Report Summary
  • 1998 Annual Research Report
  • 1997 Annual Research Report
  • Research Products

    (10 results)

All Other

All Publications (10 results)

  • [Publications] Kentro Inui: "Probabilistic GLR Parsing : A New Formalization and Its Impact on Parsing Performance"自然言語処理. 5. 33-52 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] 白井清昭: "統計的構文解析における構文的統計情報と語彙的統計情報の統合について"自然言語処理. 5. 85-106 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] Virach Sornlert Iamvanich: "Empirical Support for New Probabilistic Generalized LR Parsing"自然言語処理. 6. 3-22 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] 今井宏樹: "音声認識を目指した確立GLR法を用いた言語モデルの構築"情報処理学会論文誌. 40. 1404-1412 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] 白井清昭: "自然言語解析のためのMSLRパーザ・ツールキット"自然言語処理. (採録予定).

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] Kentaro Inui: "Probabilistic GLR Parsing : A New Formalization and Its Impact on Parsing Performance"Journal of Natural Language Processing. 5. 33-52 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] Kiyoaki Shirai: "A Framework of Integrating Syntactic and Lexical Statistics in Statistical Parsing"Journal of Natural Language Processing. 5. 85-106 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] Virach Sornlertlamvanich: "Empirical Support for New Probabilistic Generalized LR Parsing"Journal of Natural Language Processing. 6. 3-22 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] Hiroki Imai: "Construction of Language Model Using Probabilistic GLR Methods toward Speech Recognition"Transactions of Information Processing Society of Japan. 40. 1404-1412 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1999 Final Research Report Summary
  • [Publications] Kiyoaki Shirai: "MSLR Parser Tool Kit- - Tools for Natural Language Analysis"Journal of Natural Language Processing. (in appear).

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1999 Final Research Report Summary

URL: 

Published: 1997-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi