Project/Area Number |
07680379
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Yamagata University |
Principal Investigator |
KOHDA Masaki Yamagata University, Faculty of Engineering, Professor, 工学部, 教授 (00205337)
|
Co-Investigator(Kenkyū-buntansha) |
KATOH Masaharu Yamagata University, Faculty of Engineering, Asistant, 工学部, 助手 (10250953)
|
Project Period (FY) |
1995 – 1996
|
Project Status |
Completed (Fiscal Year 1996)
|
Budget Amount *help |
¥1,700,000 (Direct Cost: ¥1,700,000)
Fiscal Year 1996: ¥300,000 (Direct Cost: ¥300,000)
Fiscal Year 1995: ¥1,400,000 (Direct Cost: ¥1,400,000)
|
Keywords | Speech Recognition / Acoustic Model / Language Model / Hidden Markov Model / Hidden Markov Net / N-gram / Phoneme Decision Tree / Likelihood Normalization / サーチ手法 / 単語予備選択 |
Research Abstract |
Spontaneous speech recognition is regarded as a problem of graph search considering various restrictions through acoustic model, lexicon, language model and so on. In order to reduce a computation amount for recognition processing without degradation of recognition performance, some key technologies of spontaneous speech recognition were investigated. (1) Acoustic model and speaker adaptation The important aspects of context-dependent acoustic modeling using a limited training data set are how to tie the model parameters and how to handle the unseen contexts. We proposed the decision tree-based successive state splitting algorithm, and showed that HM-Net generated with this algorithm had high accuracy and enabled to represent any contexts. Speaker adaptation of acoustic model parameters based on MAP estimation method was also investigated. (2) Fast matching and likelihood normalization In large vocabulary word recognition, a fast preselection of word candidates was investigated. Phoneme recognition of input speech was carried out and an optimal phoneme sequence was obtained from the input speech. To select word candidates, DP matching was executed with the optimal phoneme sequence. The word candidates were verified by Viterbi scoring between input speech and HMM-based word model. Normalization technique of word likelihood for spontaneous speech recognition was also investigated. (3) Language model and task adaptation N-gram language models were constructed from EDR corpus, 5-million-word Japanese corpus. The models were investigated under various conditions about training text size, vocabulary and cutoff condition. The result of experiments clarified the optimum condition under a certain training text size. We carried out another experiments about task adaptation. An N-gram model from a dialog was mixed with the N-gram from EDR corpus, which made about 60% reduction of perplexity.
|