確率的生成モデルにおけるノンパラメトリックベイズ学習と自然言語処理への応用

Research Project

Project/Area Number	08J07036
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	佐藤一誠東京大学, 情報基盤センター, 特別研究員(DC1)
Project Period (FY)	2008 – 2011
Project Status	Completed (Fiscal Year 2010)
Budget Amount *help	¥1,800,000 (Direct Cost: ¥1,800,000) Fiscal Year 2010: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 2009: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 2008: ¥600,000 (Direct Cost: ¥600,000)
Keywords	ノンパラメトリックベイズ / 確率的生成モデル / Power-law / オンライン学習 / Latent Dirichlet Allocation / Succinct data structure / 量子系 / ベイズ学習 / 変分ベイズ / グラフ構造 / クラスタリング / Dirichlet process / 単語意味関係抽出
Research Abstract	本年度は主に3つの成果を上げることができた. これらの成果は,論文誌(1),国際会議(2)において発表を行った. 1.離散的な隠れ状態をもつ文書の確率的生成モデルにおいて、単語の出現分布がPower-lawの性質をもつモデルを提案した提案モデルは、Power-lawの性質が内在するデータにおいては,既存モデルであるLatent Dirichlet Allocation (LDa)よりも未知のデータに対する予測性能が大幅に高いことが実験的に示すことができた. 2.LDAにおける高速な決定論的逐次学習手法を提案した.本手法は,決定論的なオンライン学習アルゴリズムで,1度処理したデータは捨ててしまうので過去のデータを保持する必要がない.また,収束も早く,並列実行する必要もない手法である. 3.Succinct Data structureを利用した圧縮半構造データマイニングアルゴリズムを提案した.XMLを中心として近年,木構造型の半構造データが大量に増加している.このようなデータに対してFREQTと呼ばれる高速に頻出するパターンを抽出するアルゴリズムが提案されている.本研究では,木構造データを情報論的下限まで圧縮した状態で,FREQTアルゴリズムを適用することが可能なアルゴリズムを提案した.

Report

(3 results)

Research Products

(5 results)

All 2010 2009 2008

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (4 results)

[Journal Article] Succinct Semi-structured Data Mining Based on FREQT2010
- Author(s)
  Issei Sato, Hiroshi Nakagawa
- Journal Title
  
  日本データベース学会論文誌(英文)
  
  Volume: 9 Pages: 76-81
- Related Report
  2010 Annual Research Report
- Peer Reviewed
[Presentation] Deterministic Single-Pass Algorithm for LDA2010
- Author(s)
  Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
- Organizer
  Neural Information Processing Systems Conference
- Place of Presentation
  カナダ・バンクーバー
- Year and Date
  2010-12-06
- Related Report
  2010 Annual Research Report
[Presentation] Topic Models with Power-Law Using Pitman-Yor Process2010
- Author(s)
  Issei Sato, Hiroshi Nakagawa
- Organizer
  ACM International Conference on Knowledge Discovery and Data Mining
- Place of Presentation
  アメリカ・ワシントンDC
- Year and Date
  2010-07-26
- Related Report
  2010 Annual Research Report
[Presentation] Quantum Annealing for Variational Bayes Inference2009
- Author(s)
  佐藤一誠
- Organizer
  The 25th conference on Uncertainty in Artificial Intelligence
- Place of Presentation
  カナダ・モントリオール
- Year and Date
  2009-06-19
- Related Report
  2009 Annual Research Report
[Presentation] Knowledge Discovery of Semantic Relationships between Words Using Nonparametric Bayesian Graph Model2008
- Author(s)
  佐藤一誠
- Organizer
  ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
- Place of Presentation
  アメリカ, ラスベガス
- Year and Date
  2008-08-24
- Related Report
  2008 Annual Research Report

確率的生成モデルにおけるノンパラメトリックベイズ学習と自然言語処理への応用

Principal Investigator

佐藤 一誠 東京大学, 情報基盤センター, 特別研究員(DC1)

¥1,800,000 (Direct Cost: ¥1,800,000)

Report

Research Products

[Journal Article] Succinct Semi-structured Data Mining Based on FREQT2010

Author(s)

Journal Title

Related Report

[Presentation] Deterministic Single-Pass Algorithm for LDA2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Topic Models with Power-Law Using Pitman-Yor Process2010

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Quantum Annealing for Variational Bayes Inference2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Knowledge Discovery of Semantic Relationships between Words Using Nonparametric Bayesian Graph Model2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

佐藤一誠東京大学, 情報基盤センター, 特別研究員(DC1)