大規模日本語コーパスに基づく確率的オントロジーの構築

Research Project

Project/Area Number	18700138
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	Intelligent informatics
Research Institution	Tokyo Institute of Technology
Principal Investigator	寺井あすか Tokyo Institute of Technology, 大学院・情報理工学研究科, 21世紀COE研究員 (70422540)
Project Period (FY)	2006 – 2007
Project Status	Completed (Fiscal Year 2007)
Budget Amount *help	¥3,500,000 (Direct Cost: ¥3,500,000) Fiscal Year 2007: ¥1,900,000 (Direct Cost: ¥1,900,000) Fiscal Year 2006: ¥1,600,000 (Direct Cost: ¥1,600,000)
Keywords	知識発見とデータマイニング / 言語統計解析 / オントロジー
Research Abstract	本研究の目的は、大規模言語コーパスに基づく係り受け頻度データを用いて、確率的オントロジーを構築した。本研究で構築する、確率的オントロジーとは、概念によって形成されるカテゴリーの階層構造と、概念のカテゴリーへの帰属確率(概念が与えられたときのカテゴリーの条件付確率)を付与したものである。新聞10年分(1993年〜2002年)の形容詞-名詞、名詞-"が"-動詞、名詞-"に"-動詞、名詞-"を"-動詞の係り受け頻度データを用いて、名詞に関する確率的オントロジーを構築した。まず、毎日新聞コーパス10年分(1993年〜2002年)から、形容詞-名詞、名詞-動詞に関する係り受け頻度データを、CaboCha(工藤、松本2002)を用いて抽出した。次に、抽出した係り受け頻度データに対し、潜在クラスが介在し、単語A(形容詞または動詞)と単語N(名詞)が共起するという仮定に基づく言語統計解析(Kameya、Sato 2005)を用いて潜在クラスの推定を行った。各名詞をP(名詞\|潜在クラス)という確率で表現し、これらの言語統計解析結果に対して、ソフトクラスタリングモデルであるRoseモデル(1990)を用いることで、各階層におけるカテゴリーのセントロイドを推定することで、名詞の確率的階層構造を作成した。また、下位カテゴリーのセントロイドの上位カテゴリーへの帰属確率を計算することで、上位カテゴリーへの下位カテゴリーの帰属確率を求めた。さらに、心理学実験を行うことで、構築した確率的階層構造の妥当性の検証を行った。

Report

(2 results)

2007 Annual Research Report
2006 Annual Research Report

Research Products
(3 results)

All 2008 2007

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (2 results)

[Journal Article] Construction of a Probabilistic Hierarchical Structure based on aJapanese Corpus and a Japanese Thesaurus2008
- Author(s)
  Asuka Terai, Bin Liu, Masanori Nakagawa
- Journal Title
  
  T. Tokunaga and A. Ortega (Eds.):LKR2008, LNAI 4938, Springer-Verlag Berlin Heidelberg
  
  Pages: 132-147
- Related Report
  2007 Annual Research Report
- Peer Reviewed
[Presentation] A method for the construction of a probabilistic hierarchical structiire based on a statistical analysis of a large-scale corpus2007
- Author(s)
  Asuka Terai, Bin Liu, Masanori Nakagawa
- Organizer
  the 1st International Conference on Semantic Computing, IEEE Computer Society
- Place of Presentation
  Irvine (USA)
- Year and Date
  2007-09-17
- Related Report
  2007 Annual Research Report
[Presentation] Hierarchical Probabilistic Categorization of Japanese Words2007
- Author(s)
  Asuka Terai, Bin Liu, Masanori Nakagawa
- Organizer
  the 15th International and 72nd Annual Meeting of the Psychometric Society (IMPS2007)
- Place of Presentation
  東京(日本)
- Year and Date
  2007-07-09
- Related Report
  2007 Annual Research Report

大規模日本語コーパスに基づく確率的オントロジーの構築

Principal Investigator

寺井 あすか Tokyo Institute of Technology, 大学院・情報理工学研究科, 21世紀COE研究員 (70422540)

¥3,500,000 (Direct Cost: ¥3,500,000)

Report

Research Products

[Journal Article] Construction of a Probabilistic Hierarchical Structure based on aJapanese Corpus and a Japanese Thesaurus2008

Author(s)

Journal Title

Related Report

[Presentation] A method for the construction of a probabilistic hierarchical structiire based on a statistical analysis of a large-scale corpus2007

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Hierarchical Probabilistic Categorization of Japanese Words2007

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

寺井あすか Tokyo Institute of Technology, 大学院・情報理工学研究科, 21世紀COE研究員 (70422540)