Project/Area Number |
10680368
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Yamagata University |
Principal Investigator |
KOHDA Masaki Yamagata University, Faculty of Engineering, Professor, 工学部, 教授 (00205337)
|
Co-Investigator(Kenkyū-buntansha) |
KATOH Masaharu Yamagata University, Faculty of Engineering, Research Associate, 工学部, 助手 (10250953)
ITO Akinori Yamagata University, Faculty of Engineering, Associate Professor, 工学部, 助教授 (70232428)
|
Project Period (FY) |
1998 – 2000
|
Project Status |
Completed (Fiscal Year 2000)
|
Budget Amount *help |
¥3,300,000 (Direct Cost: ¥3,300,000)
Fiscal Year 2000: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 1999: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 1998: ¥2,200,000 (Direct Cost: ¥2,200,000)
|
Keywords | Large Vocabulary Continuous Speech Recognition / Acoustic Model / Language Model / Decoder / Hidden Markov Net / N-gram / Speaker Adaptation / Task Adaptation / クラスN-gram / パープレキシティ / 単語誤り率 / エルゴディックHMM / マルチパスサーチ / 音素グラフ / 単語グラフ / HM-Net / SCFG / MLLR話者適応 / LPCメルケプストラム / triphone / N-gram言語モデル / 新聞記事読み上げ文 |
Research Abstract |
We investigated large vocabulary continuous speech recognition (LVCSR) system on Japanese newspaper reading task, and obtained the following results. (1) Acoustic models : A Hidden Markov Network (HM-Net) is a highly accurate and robust acoustic model which represents a tied-state structure of context dependent Hidden Markov Models as a network. We propose a state clustering-based rapid topology design method to generate high accuracy HM-Nets for LVCSR.Furthermore, MLLR (Maximum Likelihood Linear Regression)-based speaker adaptation of acoustic models is investigated, and a regression class selection algorithm based on the BIC principle is proposed. (2) Language models : N-gram task adaptation method is investigated, which uses large corpus of the general task (TI text) and small corpus of the specific task (AD text), and employs a simple weighting to mix TI and AD texts. Furthermore we propose a new SCFG (Stochastic Context Free Grammar) model which uses a phrase-based dependency gramma
… More
r instead of general CFG.Word error rate in the case of using the mixture model besed on the proposed SCFG model and trigram becomes less than that in the case of using only the trigram. (3) Decoder : We investigate about fast search strategies for LVCSR, and propose a new method - a phoneme-graph-based hypothesis restriction, which effectually prunes the search space. In the proposed method, a phoneme graph is generated at the pre-processing stage, and then the best word sequence is searched while restricting expansion of hypotheses using the information of the phoneme graph at the main recognition stage. In the multiple pass LVCSR system that uses word graph as an intermediate data structure, decoder parameters should be optimized in order to generate a good word graph. A new method to optimize these parameters is proposed. This method uses rescoring of the word graph using bigram LM instead of generating many word graphs for each parameter setting. (4) Software Tool : We describe a statistical language model toolkit for word and class-based n-gram. This toolkit has command-level compatibility with CMU-Cambridge SLM Toolkit, and supports class n-gram and n-gram count mixture as well as combined language model using linear interpolation. Less
|