• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2000 Fiscal Year Final Research Report Summary

Large Vocabulary Continuous Speech Recognition System on Japanese Newspaper Reading Task

Research Project

Project/Area Number 10680368
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionYamagata University

Principal Investigator

KOHDA Masaki  Yamagata University, Faculty of Engineering, Professor, 工学部, 教授 (00205337)

Co-Investigator(Kenkyū-buntansha) KATOH Masaharu  Yamagata University, Faculty of Engineering, Research Associate, 工学部, 助手 (10250953)
ITO Akinori  Yamagata University, Faculty of Engineering, Associate Professor, 工学部, 助教授 (70232428)
Project Period (FY) 1998 – 2000
KeywordsLarge Vocabulary Continuous Speech Recognition / Acoustic Model / Language Model / Decoder / Hidden Markov Net / N-gram / Speaker Adaptation / Task Adaptation
Research Abstract

We investigated large vocabulary continuous speech recognition (LVCSR) system on Japanese newspaper reading task, and obtained the following results.
(1) Acoustic models : A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. We propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR.Furthermore, MLLR (Maximum Likelihood Linear Regression)-based speaker adaptation of acoustic models is investigated, and a regression class selection algorithm based on the BIC principle is proposed.
(2) Language models : N-gram task adaptation method is investigated, which uses large corpus of the general task (TI text) and small corpus of the specific task (AD text), and employs a simple weighting to mix TI and AD texts. Furthermore we propose a new SCFG (Stochastic Context Free Grammar) model which uses a phrase-based dependency gramma … More r instead of general CFG.Word error rate in the case of using the mixture model besed on the proposed SCFG model and trigram becomes less than that in the case of using only the trigram.
(3) Decoder : We investigate about fast search strategies for LVCSR, and propose a new method - a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. In the multiple pass LVCSR system that uses word graph as an intermediate data structure, decoder parameters should be optimized in order to generate a good word graph. A new method to optimize these parameters is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting.
(4) Software Tool : We describe a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation. Less

  • Research Products

    (14 results)

All Other

All Publications (14 results)

  • [Publications] 堀貴明: "状態クラスタリングによるHM-Netの構造決定法の検討"電子情報通信学会論文誌(D-II). J81-D-II. 2239-2248 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 堀貴明: "連続/セミ連続分布型HMMによる単語音声認識のViterbi best-firstサーチにおける推定スコタ設定法"電子情報通信学会論文誌(D-II). J81-D-II. 2526-2534 (1998)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 堀貴明: "大語彙連続音声認識のための音素グラフに基づく仮説制限法の検討"情報処理学会論文誌. 40. 1365-1373 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 堀智織: "音声認識のための確率文脈自由文法に基づく言語モデルの構築と評価"電子情報通信学会論文誌(D-II). J83-D-II. 2407-2417 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 伊藤彰則: "N-gram出現回数の混合によるタスク適応の性能解析"電子情報通信学会論文誌(D-II). J83-D-II. 2418-2427 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 伊藤彰則: "A new metric for stochastic language model evaluation"Euro.Conf.on Speech Commu.and Technology. Vol.4. 1591-1594 (1999)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] 伊藤彰則: "Language modeling by stochastic dependency grammar for Japanese speech recognition"International Conf.on Spoken Language Processing. Vol.1. 246-249 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] T.Hori, M.Katoh, A.Ito, M.Kohda: "A Study on a State Clustering-Based Topology Design Method for HM-Hets"Trans. IEICE (D-II). Vol.J81-D-II.No.10. 2239-2248 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] T.Hori, M.Katoh, M.Kohda: "A Study on Heuristic Score Estimation in Viterbi Best-First Search for Isolated Word Recognition Using Continuous/Semi-Continuous HMMs"Trans. IEICE (D-II). Vol.J81-D-II, No.11. 2526-2534 (1998)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] T.Hori, N.Oka, M.Katoh, A.Ito, M.Kohda: "A Study on a Phoneme-graph-based Hypothesis Restriction for Large Vocabulary Continuous Speech Recognition"Trans. IPSJ.. Vol.40, No.4. 1365-1373 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] C.Hori, M.Katoh, A.Ito, M.Kohda: "Construction and Evaluation of Language Models Based on Stochastic Context Free Grammar for Speech Recognition"Trans. IEICE (D-II). Vol.J83-D-II, No.11. 2407-2417 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] A.Ito, M.Kohda: "Evaluation of Task Adaptation Using N-Gram Count Mixture"Trans. IEICE (D-II). Vol.J83-D-II, No.11. 2418-2427 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] A.Ito, M.Kohda, M.Ostendorf: "A New Metric for Stochastic Language Model Evaluation"Proc. Euro. Conf. on Speech Commu. and Technology. Vol.4. 1591-1594 (1999)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] A.Ito, C.Hori, M.Katoh, M.Kohda: "Language Modeling by Stochastic Dependency Grammar for Japanese Speech Recognition"Proc. International Conf. on Spoken Language Processing. Vol.1. 246-249 (2000)

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2002-03-26  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi