• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Improvement of topic-based language models using Dirichlet mixtures and their applications

Research Project

Project/Area Number 17500105
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionUniversity of Tsukuba

Principal Investigator

YAMAMOTO Mikio  University of Tsukuba, Graduate School of Systems and Information Engineering, Associate Processor, 大学院システム情報工学研究科, 助教授 (40210562)

Project Period (FY) 2005 – 2006
Project Status Completed (Fiscal Year 2006)
Budget Amount *help
¥3,700,000 (Direct Cost: ¥3,700,000)
Fiscal Year 2006: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2005: ¥2,300,000 (Direct Cost: ¥2,300,000)
KeywordsDirichlet Mixtures / statistical language models / topic-based models / Bayesian statistics / speech recognition / statistical machine translation / 言語横断モデル / ベイズモデル
Research Abstract

For improving statistical language models, we enhanced predictive power of ngram models, which are typical language models, using topic or context information. We proposed new estimation methods for Dirichlet mixtures and evaluated the model on applications ; speech recognition and statistical machine translation.
1. We developed a robust estimation method for Dirichlet mixtures language models using hierarchical Bayesian models. In order to approximate integration appeared in Bayesian inference, we used the reversing-EM and variational approximation. In the experiments using various text data, we showed the estimation method achieves the lowest perplexity level.
2. Our model was integrated in speech recognition systems, and evaluated by recognition rate. Two integration methods were developed ; (1) modification of probability of trigram models using the unigram rescaling, (2) optimization on document level using document likelihood computed by our model. Comparing Latent Dirichlet Allocation (LDA) with our model, we showed the speech recognition rate of the system with our model is higher than that of LDA.
3. We proposed cross-language Dirichlet mixture models which were integrated in phrase-based statistical machine translation systems. Using this model, the system can select contextually or topically correct Japanese words from candidates as translation of English input document. Experiments using newspaper articles translation showed that topic models were effective for lower perplexity.

Report

(3 results)
  • 2006 Annual Research Report   Final Research Report Summary
  • 2005 Annual Research Report
  • Research Products

    (11 results)

All 2006 2005

All Journal Article (10 results) Book (1 results)

  • [Journal Article] 英日統計的機械翻訳における語順優先探索デコーダ2006

    • Author(s)
      岩越隼人
    • Journal Title

      情報処理学会論文誌 47・11

      Pages: 3032-3040

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Annual Research Report 2006 Final Research Report Summary
  • [Journal Article] Document level optimization in speech recognition2006

    • Author(s)
      Rie NAKAZATO
    • Journal Title

      The 4th meeting of ASA and ASJ 2006

      Pages: 7-7

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Annual Research Report 2006 Final Research Report Summary
  • [Journal Article] Relevance feedback models for recommendation2006

    • Author(s)
      Masao UTIYAMA
    • Journal Title

      The Proc. of the 2006 Conf. on Empirical Methods in NLP 2006

      Pages: 305-313

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Annual Research Report 2006 Final Research Report Summary
  • [Journal Article] Reordering priority decoder for statistical machine translation2006

    • Author(s)
      Hayato Iwakoshi
    • Journal Title

      Transactions of IPSJ Vol.47,No.11

      Pages: 3032-3040

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Document level optimization in speech recognition2006

    • Author(s)
      Rie Nakazato
    • Journal Title

      The 4th meeting of ASA and ASJ

      Pages: 7-7

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Relevance feedback models for recommendation2006

    • Author(s)
      Masao Utiyama
    • Journal Title

      The Proceedings of the 2006 conference on Empirical Methods in NLP

      Pages: 305-313

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 混合ディリクレ分布を用いたトピックに基づく言語モデル2005

    • Author(s)
      貞光九月
    • Journal Title

      電子情報通信学会論文誌 J88-DII-9

      Pages: 1771-1779

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Topic-based language models using Dirichlet Mixtures2005

    • Author(s)
      Kugatsu Sadamitsu
    • Journal Title

      The IEICE Transactions on Information and Systems PT.2,Vol.J87-D-II,No.7

      Pages: 1771-1779

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] Spoken language resources2005

    • Author(s)
      Shuichi Itahashi
    • Journal Title

      Spoken Language Systems (S.Nakagawa et al. ed.)(Ohmsha) Chapter 8

      Pages: 317-331

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2006 Final Research Report Summary
  • [Journal Article] 混合ディリクレ分布を用いたトピックに基づく言語モデル2005

    • Author(s)
      貞光九月
    • Journal Title

      電子情報通信学会論文誌 J88-D-II・9

      Pages: 1771-1779

    • Related Report
      2005 Annual Research Report
  • [Book] Spoken Language Systems (8章分担「Spoken Language Resources」)2005

    • Author(s)
      S.Nakagawa et al.(Eds)
    • Publisher
      Ohmsha
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2006 Final Research Report Summary

URL: 

Published: 2005-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi