• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Large Vocabulary Continuous Speech Recognition System on Japanese Newspaper Reading Task

Research Project

Project/Area Number 10680368
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionYamagata University

Principal Investigator

KOHDA Masaki  Yamagata University, Faculty of Engineering, Professor, 工学部, 教授 (00205337)

Co-Investigator(Kenkyū-buntansha) KATOH Masaharu  Yamagata University, Faculty of Engineering, Research Associate, 工学部, 助手 (10250953)
ITO Akinori  Yamagata University, Faculty of Engineering, Associate Professor, 工学部, 助教授 (70232428)
Project Period (FY) 1998 – 2000
Project Status Completed (Fiscal Year 2000)
Budget Amount *help
¥3,300,000 (Direct Cost: ¥3,300,000)
Fiscal Year 2000: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 1999: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 1998: ¥2,200,000 (Direct Cost: ¥2,200,000)
KeywordsLarge Vocabulary Continuous Speech Recognition / Acoustic Model / Language Model / Decoder / Hidden Markov Net / N-gram / Speaker Adaptation / Task Adaptation / クラスN-gram / パープレキシティ / 単語誤り率 / エルゴディックHMM / マルチパスサーチ / 音素グラフ / 単語グラフ / HM-Net / SCFG / MLLR話者適応 / LPCメルケプストラム / triphone / N-gram言語モデル / 新聞記事読み上げ文
Research Abstract

We investigated large vocabulary continuous speech recognition (LVCSR) system on Japanese newspaper reading task, and obtained the following results.
(1) Acoustic models : A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. We propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR.Furthermore, MLLR (Maximum Likelihood Linear Regression)-based speaker adaptation of acoustic models is investigated, and a regression class selection algorithm based on the BIC principle is proposed.
(2) Language models : N-gram task adaptation method is investigated, which uses large corpus of the general task (TI text) and small corpus of the specific task (AD text), and employs a simple weighting to mix TI and AD texts. Furthermore we propose a new SCFG (Stochastic Context Free Grammar) model which uses a phrase-based dependency gramma … More r instead of general CFG.Word error rate in the case of using the mixture model besed on the proposed SCFG model and trigram becomes less than that in the case of using only the trigram.
(3) Decoder : We investigate about fast search strategies for LVCSR, and propose a new method - a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. In the multiple pass LVCSR system that uses word graph as an intermediate data structure, decoder parameters should be optimized in order to generate a good word graph. A new method to optimize these parameters is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting.
(4) Software Tool : We describe a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation. Less

Report

(4 results)
  • 2000 Annual Research Report   Final Research Report Summary
  • 1999 Annual Research Report
  • 1998 Annual Research Report
  • Research Products

    (49 results)

All Other

All Publications (49 results)

  • [Publications] 堀貴明: "状態クラスタリングによるHM-Netの構造決定法の検討"電子情報通信学会論文誌(D-II). J81-D-II. 2239-2248 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 堀貴明: "連続/セミ連続分布型HMMによる単語音声認識のViterbi best-firstサーチにおける推定スコタ設定法"電子情報通信学会論文誌(D-II). J81-D-II. 2526-2534 (1998)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 堀貴明: "大語彙連続音声認識のための音素グラフに基づく仮説制限法の検討"情報処理学会論文誌. 40. 1365-1373 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 堀智織: "音声認識のための確率文脈自由文法に基づく言語モデルの構築と評価"電子情報通信学会論文誌(D-II). J83-D-II. 2407-2417 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 伊藤彰則: "N-gram出現回数の混合によるタスク適応の性能解析"電子情報通信学会論文誌(D-II). J83-D-II. 2418-2427 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 伊藤彰則: "A new metric for stochastic language model evaluation"Euro.Conf.on Speech Commu.and Technology. Vol.4. 1591-1594 (1999)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 伊藤彰則: "Language modeling by stochastic dependency grammar for Japanese speech recognition"International Conf.on Spoken Language Processing. Vol.1. 246-249 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Hori, M.Katoh, A.Ito, M.Kohda: "A Study on a State Clustering-Based Topology Design Method for HM-Hets"Trans. IEICE (D-II). Vol.J81-D-II.No.10. 2239-2248 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Hori, M.Katoh, M.Kohda: "A Study on Heuristic Score Estimation in Viterbi Best-First Search for Isolated Word Recognition Using Continuous/Semi-Continuous HMMs"Trans. IEICE (D-II). Vol.J81-D-II, No.11. 2526-2534 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] T.Hori, N.Oka, M.Katoh, A.Ito, M.Kohda: "A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition"Trans. IPSJ.. Vol.40, No.4. 1365-1373 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] C.Hori, M.Katoh, A.Ito, M.Kohda: "Construction and Evaluation of Language Models Based on Stochastic Context Free Grammar for Speech Recognition"Trans. IEICE (D-II). Vol.J83-D-II, No.11. 2407-2417 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] A.Ito, M.Kohda: "Evaluation of Task Adaptation Using N-Gram Count Mixture"Trans. IEICE (D-II). Vol.J83-D-II, No.11. 2418-2427 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] A.Ito, M.Kohda, M.Ostendorf: "A New Metric for Stochastic Language Model Evaluation"Proc. Euro. Conf. on Speech Commu. and Technology. Vol.4. 1591-1594 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] A.Ito, C.Hori, M.Katoh, M.Kohda: "Language Modeling by Stochastic Dependency Grammar for Japanese Speech Recognition"Proc. International Conf. on Spoken Language Processing. Vol.1. 246-249 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2000 Final Research Report Summary
  • [Publications] 堀智織: "音声認識のための確率文脈自由文法に基づく言語モデルの構築と評価"電子情報通信学会論文誌(D-II). J83-D-II,11. 2407-2417 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 伊藤彰則: "N-gram出現回数の混合によるタスク適応の性能解析"電子情報通信学会論文誌(D-II). J83-D-II,11. 2418-2427 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 伊藤彰則: "Language modeling by stochastic dependency grammar for Japanese speech recognition"Proceedings of ICSLP 2000. Vol.1,M1-24. 246-249 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 加藤正治: "話者照合におけるMLLRベースの話者モデル作成の検討"電子情報通信学会技術研究報告. SP2000-19. 25-32 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 伊藤彰則: "N-gramに基づくエルゴディックHMMによる言語モデル"電子情報通信学会技術研究報告. SP2000-25. 67-74 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 斎院俊典: "単語グラフ生成の言語重み・挿入ペナルティ最適化の検討"電子情報通信学会技術研究報告. SP2000-26. 75-82 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 加藤正治: "単語グラフ生成におけるパラメータ最適化の検討"電子情報通信学会技術研究報告. SP2000-93. 107-112 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 伊藤彰則: "単語およびクラスN-gram作成のためのツールキット"電子情報通信学会技術研究報告. SP2000-106. 67-72 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 斎院俊典: "自然発話文の大語彙連続音声認識"情報処理学会東北支部研究会. 2000-4-13. 1-8 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 田嶋昇: "HM-Net音響モデルを用いる話者照合"情報処理学会東北支部研究会. 2000-4-14. 1-7 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 竹内亜未: "確率文脈自由文法に基づく言語モデル"情報処理学会東北支部研究会. 2000-4-15. 1-8 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 竹内亜未: "確率文脈自由文法に基づく言語モデル"電気関係学会東北支部連合大会. 2A-3. 18 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 田嶋昇: "HM-Net音響モデルを用いる話者照合"電気関係学会東北支部連合大会. 2A-13. 28 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 夏井武雄: "セグメント単位入力HMMに基づく音声認識"電気関係学会東北支部連合大会. 2A-14. 29 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 加藤正治: "単語グラフ生成におけるパラメータ最適化の検討"日本音響学会講演論文集. 1-5-17. 33-34 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] 伊藤彰則: "単語およびクラスN-gram作成のための統計的言語モデルツールキット"日本音響学会講演論文集. 2-3-12. 77-78 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] 堀 貴明: "大語彙連続音声認識のための音素グラフに基づく仮説制限法の検討"情報処理学会論文誌. 40,4. 1365-1373 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 伊藤 彰則: "尤度差に基づくn-gram言語モデル評価のための指標"電子情報通信学会技術研究報告. SP99-39. 95-102 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 堀 智織: "確率文脈自由文法を用いた言語モデルの構築と音声認識実験による評価"電子情報通信学会技術研究報告. SP99-37. 79-86 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 伊藤 彰則: "A new metric for stochastic language model evaluation"Eurospeech '99. S8.po1. 1591-1594 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 加藤 正治: "複数の認識出力の統合による性能改善の検討"日本音響学会講演論文集. 2-1-16. 85-86 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 斎藤 秀樹: "bigram に基づく ergodic HMM による言語モデルの検討"日本音響学会講演論文集. 3-1-3. 101-102 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 伊藤 彰則: "N-best 候補からの言語重みと挿入ペナルティの最適化に関する検討"情報処理学会研究報告. 99-SLP-28-6. 35-40 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 岡 直生: "音素グラフに基づく仮説制限法を用いた大語彙連続音声認識の検討"電子情報通信学会技術研究報告. SP99-126. 67-72 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 加納 淳也: "話者照合における話者モデルの MLLR 適応の検討"電子情報通信学会技術研究報告. SP99-102. 55-60 (1999)

    • Related Report
      1999 Annual Research Report
  • [Publications] 斎院 俊典: "単語グラフ生成の言語重み・挿入ペナルティ最適化の検討"日本音響学会講演論文集. 2-8-12. 47-48 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] 加納 淳也: "MLLR 適応における MDL 基準に基づく回帰クラスタ設定の検討"日本音響学会講演論文集. 3-9-5. 103-104 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] 斎藤 秀樹: "Trigram に基づく Ergodic HMM による言語モデルの検討"日本音響学会講演論文集. 2-8-12. 51-52 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] 小笠原 教充: "品詞と高頻度単語の N-gram を使用したタスク適応の検討"日本音響学会講演論文集. 3-8-5. 75-76 (2000)

    • Related Report
      1999 Annual Research Report
  • [Publications] 堀 貴明: "状能クラスタリングによるHM-Netの構造決定法の検討" 電子情報通信学会論文誌. J81-D-II,10. 2239-2248 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] 堀 貴明: "連続1セミ連続分布型HMMによる単語音声認識のViterbi best-firstサーチにおける推定スコア設定法の検討" 電子情報通信学会論文誌. J81-D-II,11. 2526-2534 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] 堀 貴明: "大語彙連続音声認識のための音素グラフに基づく仮説制限法の検討" 電子情報通信学会技術研究報告. SP98-111. 25-32 (1998)

    • Related Report
      1998 Annual Research Report
  • [Publications] 斎院俊典: "音素と音節を単位とするHM-Net音響モデルの検討" 情報処理学会東北支部研究会. 98-4-1. 1-8 (1999)

    • Related Report
      1998 Annual Research Report
  • [Publications] 鈴木健市: "大語彙連続音声認識におけるデコーダの評価" 情報処理学会東北支部研究会. 98-4-3. 17-24 (1999)

    • Related Report
      1998 Annual Research Report
  • [Publications] 亀山誠裕: "新聞記事コーパスからのN-gram言語モデル作成と音声認識実験による評価" 情報処理学会東北支部研究会. 98-4-4. 25-32 (1999)

    • Related Report
      1998 Annual Research Report

URL: 

Published: 1998-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi