Improvement of topic-based language models using Dirichlet mixtures and their applications

Research Project

Project/Area Number	17500105
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	University of Tsukuba
Principal Investigator	YAMAMOTO Mikio University of Tsukuba, Graduate School of Systems and Information Engineering, Associate Processor, 大学院システム情報工学研究科, 助教授 (40210562)
Project Period (FY)	2005 – 2006
Project Status	Completed (Fiscal Year 2006)
Budget Amount *help	¥3,700,000 (Direct Cost: ¥3,700,000) Fiscal Year 2006: ¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 2005: ¥2,300,000 (Direct Cost: ¥2,300,000)
Keywords	Dirichlet Mixtures / statistical language models / topic-based models / Bayesian statistics / speech recognition / statistical machine translation / 言語横断モデル / ベイズモデル
Research Abstract	For improving statistical language models, we enhanced predictive power of ngram models, which are typical language models, using topic or context information. We proposed new estimation methods for Dirichlet mixtures and evaluated the model on applications ; speech recognition and statistical machine translation. 1. We developed a robust estimation method for Dirichlet mixtures language models using hierarchical Bayesian models. In order to approximate integration appeared in Bayesian inference, we used the reversing-EM and variational approximation. In the experiments using various text data, we showed the estimation method achieves the lowest perplexity level. 2. Our model was integrated in speech recognition systems, and evaluated by recognition rate. Two integration methods were developed ; (1) modification of probability of trigram models using the unigram rescaling, (2) optimization on document level using document likelihood computed by our model. Comparing Latent Dirichlet Allocation (LDA) with our model, we showed the speech recognition rate of the system with our model is higher than that of LDA. 3. We proposed cross-language Dirichlet mixture models which were integrated in phrase-based statistical machine translation systems. Using this model, the system can select contextually or topically correct Japanese words from candidates as translation of English input document. Experiments using newspaper articles translation showed that topic models were effective for lower perplexity.

Report

(3 results)

2006 Annual Research Report Final Research Report Summary
2005 Annual Research Report

Research Products

(11 results)

All 2006 2005

All Journal Article (10 results) Book (1 results)

[Journal Article] 英日統計的機械翻訳における語順優先探索デコーダ2006
- Author(s)
  岩越隼人
- Journal Title
  
  情報処理学会論文誌 47・11
  
  Pages: 3032-3040
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Journal Article] Document level optimization in speech recognition2006
- Author(s)
  Rie NAKAZATO
- Journal Title
  
  The 4th meeting of ASA and ASJ 2006
  
  Pages: 7-7
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Journal Article] Relevance feedback models for recommendation2006
- Author(s)
  Masao UTIYAMA
- Journal Title
  
  The Proc. of the 2006 Conf. on Empirical Methods in NLP 2006
  
  Pages: 305-313
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Annual Research Report 2006 Final Research Report Summary
[Journal Article] Reordering priority decoder for statistical machine translation2006
- Author(s)
  Hayato Iwakoshi
- Journal Title
  
  Transactions of IPSJ Vol.47,No.11
  
  Pages: 3032-3040
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Document level optimization in speech recognition2006
- Author(s)
  Rie Nakazato
- Journal Title
  
  The 4th meeting of ASA and ASJ
  
  Pages: 7-7
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Relevance feedback models for recommendation2006
- Author(s)
  Masao Utiyama
- Journal Title
  
  The Proceedings of the 2006 conference on Empirical Methods in NLP
  
  Pages: 305-313
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 混合ディリクレ分布を用いたトピックに基づく言語モデル2005
- Author(s)
  貞光九月
- Journal Title
  
  電子情報通信学会論文誌 J88-DII-9
  
  Pages: 1771-1779
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Topic-based language models using Dirichlet Mixtures2005
- Author(s)
  Kugatsu Sadamitsu
- Journal Title
  
  The IEICE Transactions on Information and Systems PT.2,Vol.J87-D-II,No.7
  
  Pages: 1771-1779
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] Spoken language resources2005
- Author(s)
  Shuichi Itahashi
- Journal Title
  
  Spoken Language Systems (S.Nakagawa et al. ed.)(Ohmsha) Chapter 8
  
  Pages: 317-331
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2006 Final Research Report Summary
[Journal Article] 混合ディリクレ分布を用いたトピックに基づく言語モデル2005
- Author(s)
  貞光九月
- Journal Title
  
  電子情報通信学会論文誌 J88-D-II・9
  
  Pages: 1771-1779
- Related Report
  2005 Annual Research Report
[Book] Spoken Language Systems (8章分担「Spoken Language Resources」)2005
- Author(s)
  S.Nakagawa et al.(Eds)
- Publisher
  Ohmsha
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2006 Final Research Report Summary

Improvement of topic-based language models using Dirichlet mixtures and their applications

Principal Investigator

YAMAMOTO Mikio University of Tsukuba, Graduate School of Systems and Information Engineering, Associate Processor, 大学院システム情報工学研究科, 助教授 (40210562)

¥3,700,000 (Direct Cost: ¥3,700,000)

Report

Research Products

[Journal Article] 英日統計的機械翻訳における語順優先探索デコーダ2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Document level optimization in speech recognition2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Relevance feedback models for recommendation2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Reordering priority decoder for statistical machine translation2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Document level optimization in speech recognition2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] Relevance feedback models for recommendation2006

Author(s)

Journal Title

Description

Related Report

[Journal Article] 混合ディリクレ分布を用いたトピックに基づく言語モデル2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Topic-based language models using Dirichlet Mixtures2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Spoken language resources2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] 混合ディリクレ分布を用いたトピックに基づく言語モデル2005

Author(s)

Journal Title

Related Report

[Book] Spoken Language Systems (8章分担「Spoken Language Resources」)2005

Author(s)

Publisher

Description

Related Report