Research on Automatic Incremental Construction of Language Resources

Research Project

Project/Area Number	09308009
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Tokyo Institute of Technology
Principal Investigator	TANAKA Hozumi Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Professor, 大学院・情報理工学研究科, 教授 (80163567)
Co-Investigator(Kenkyū-buntansha)	SHIRAI Kiyoaki Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Research Assistant, 大学院・情報理工学研究科, 助手 (30302970) INUI Kentaro Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Assistant Professor, 情報工学部, 助教授 (60272689) TOKUNAGA Takenobu Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Associate Professor, 大学院・情報理工学研究科, 助教授 (20197875)
Project Period (FY)	1997 – 1999
Project Status	Completed (Fiscal Year 1999)
Budget Amount *help	¥22,700,000 (Direct Cost: ¥22,700,000) Fiscal Year 1999: ¥2,600,000 (Direct Cost: ¥2,600,000) Fiscal Year 1998: ¥7,200,000 (Direct Cost: ¥7,200,000) Fiscal Year 1997: ¥12,900,000 (Direct Cost: ¥12,900,000)
Keywords	language resource / corpus / morphological analysis / syntactic analysis / probabilistic language model / automatic acquisition of knowledge / 言語知識ベース / 自然言語処理 / 注釈付きコーパス / 言語知識獲得 / MSLR構文解析法 / 確率一般化LRモデル / 確率GLR構文解析法 / 形態素接続表
Research Abstract	This research project is targeted at the automatic incremental construction of a corpus annotated with morphological information, such as word segmentation and part-of-speech (POS hereafter) tags, and syntactic information such as syntactic trees. An overview of the proposed method is as follows : We first analyse large volumes of text to obtain morphological and syntactic information to annotate the text with. Next we newly obtain knowledge for natural language analysis, i.e. we acquire a connection matrix, and train the probabilistic generarized LR language model (PGLR model hereafter). The connection matrix describes adjacency constraints between POS pairs. It can be aquired from POS tagged corpus automatically by way of regarding each POS pair as legally adjacent if it appears in sequence in the training corpus, and illegal otherwise. The PGLR language model provides the probabilistic language model, and is easily trainable given a tree-annotated corpus. Given these knowledge resources, we re-analyze the sentences, and newly obtain morphological and syntactic information. By repeating this procedure, we construct a corpus annotated with morphological and syntactic information automatically. Our experiment shows that the proposed method is effective for the enlargement of existing annotated corpora.

Report

(4 results)

1999 Annual Research Report Final Research Report Summary
1998 Annual Research Report
1997 Annual Research Report

Research Products
(10 results)

All Other

All Publications (10 results)

[Publications] Kentro Inui: "Probabilistic GLR Parsing : A New Formalization and Its Impact on Parsing Performance"自然言語処理. 5. 33-52 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] 白井清昭: "統計的構文解析における構文的統計情報と語彙的統計情報の統合について"自然言語処理. 5. 85-106 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] Virach Sornlert Iamvanich: "Empirical Support for New Probabilistic Generalized LR Parsing"自然言語処理. 6. 3-22 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] 今井宏樹: "音声認識を目指した確立GLR法を用いた言語モデルの構築"情報処理学会論文誌. 40. 1404-1412 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] 白井清昭: "自然言語解析のためのMSLRパーザ・ツールキット"自然言語処理. (採録予定).
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] Kentaro Inui: "Probabilistic GLR Parsing : A New Formalization and Its Impact on Parsing Performance"Journal of Natural Language Processing. 5. 33-52 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] Kiyoaki Shirai: "A Framework of Integrating Syntactic and Lexical Statistics in Statistical Parsing"Journal of Natural Language Processing. 5. 85-106 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] Virach Sornlertlamvanich: "Empirical Support for New Probabilistic Generalized LR Parsing"Journal of Natural Language Processing. 6. 3-22 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] Hiroki Imai: "Construction of Language Model Using Probabilistic GLR Methods toward Speech Recognition"Transactions of Information Processing Society of Japan. 40. 1404-1412 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1999 Final Research Report Summary
[Publications] Kiyoaki Shirai: "MSLR Parser Tool Kit- - Tools for Natural Language Analysis"Journal of Natural Language Processing. (in appear).
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1999 Final Research Report Summary

Research on Automatic Incremental Construction of Language Resources

Principal Investigator

TANAKA Hozumi Graduate School of Information Science and Engineering, Tokyo Institute of Technology, Professor, 大学院・情報理工学研究科, 教授 (80163567)

¥22,700,000 (Direct Cost: ¥22,700,000)

Report

Research Products

[Publications] Kentro Inui: "Probabilistic GLR Parsing : A New Formalization and Its Impact on Parsing Performance"自然言語処理. 5. 33-52 (1998)

Description

Related Report

[Publications] 白井清昭: "統計的構文解析における構文的統計情報と語彙的統計情報の統合について"自然言語処理. 5. 85-106 (1998)

Description

Related Report

[Publications] Virach Sornlert Iamvanich: "Empirical Support for New Probabilistic Generalized LR Parsing"自然言語処理. 6. 3-22 (1999)

Description

Related Report

[Publications] 今井宏樹: "音声認識を目指した確立GLR法を用いた言語モデルの構築"情報処理学会論文誌. 40. 1404-1412 (1999)

Description

Related Report

[Publications] 白井清昭: "自然言語解析のためのMSLRパーザ・ツールキット"自然言語処理. (採録予定).

Description

Related Report

[Publications] Kentaro Inui: "Probabilistic GLR Parsing : A New Formalization and Its Impact on Parsing Performance"Journal of Natural Language Processing. 5. 33-52 (1998)

Description

Related Report

[Publications] Kiyoaki Shirai: "A Framework of Integrating Syntactic and Lexical Statistics in Statistical Parsing"Journal of Natural Language Processing. 5. 85-106 (1998)

Description

Related Report

[Publications] Virach Sornlertlamvanich: "Empirical Support for New Probabilistic Generalized LR Parsing"Journal of Natural Language Processing. 6. 3-22 (1999)

Description

Related Report

[Publications] Hiroki Imai: "Construction of Language Model Using Probabilistic GLR Methods toward Speech Recognition"Transactions of Information Processing Society of Japan. 40. 1404-1412 (1999)

Description

Related Report

[Publications] Kiyoaki Shirai: "MSLR Parser Tool Kit- - Tools for Natural Language Analysis"Journal of Natural Language Processing. (in appear).

Description

Related Report