Project/Area Number |
15200014
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Nagoya University |
Principal Investigator |
TAKEDA Kazuya Nagoya University, Graduate School of Information Science, Professor, 情報科学研究科, 教授 (20273295)
|
Co-Investigator(Kenkyū-buntansha) |
SHIKANO Kiyohiro Nara Institute of Science and Technology, Graduate School of Information Science, Professor, 情報科学研究科, 教授 (00263426)
KAWAHARA Tatsuya Kyoto University, Graduate School of Informatics, Professor, 学術情報メディアセンター, 教授 (00234104)
|
Project Period (FY) |
2003 – 2005
|
Project Status |
Completed (Fiscal Year 2005)
|
Budget Amount *help |
¥45,890,000 (Direct Cost: ¥35,300,000、Indirect Cost: ¥10,590,000)
Fiscal Year 2005: ¥14,950,000 (Direct Cost: ¥11,500,000、Indirect Cost: ¥3,450,000)
Fiscal Year 2004: ¥14,950,000 (Direct Cost: ¥11,500,000、Indirect Cost: ¥3,450,000)
Fiscal Year 2003: ¥15,990,000 (Direct Cost: ¥12,300,000、Indirect Cost: ¥3,690,000)
|
Keywords | speech recognition / acoustic model / speech corpus / distributed database / distributed training / sufficient statistics / speaker adaptation / 分散処理 / HMM / モデル補間 / 連続音声認識 |
Research Abstract |
In order to collect speech utterances made under various environmental conditions, field tests of spoken dialogue systems have been conducted for the public transportation guidance, the in-car information retrieval and the guidance for a public space. Based on the three corpora, a prototype of the data sharing infrastructure for acoustic model training has been developed. In the system, one can search for the particular speech subsets by invoking queries on the age of the speakers, SNR of the utterance and distribution of the phoneme frequency. The system can train a set of HMM's by sharing the efficient statistics, i.e., the visiting count, the branching count, the sum and the square sum, for the Gaussian Mixture pdf's for each state of HMM acoustic models. In addition, in order to characterize the utterance, a blind, i.e., does not require the explicit voice activity detection (VAD), method for SNR is developed for wide range of the SNR. As for the training strategy, not only the maximum likelihood (ML) training over the set of utterances, but also a model adaptation method using only statistics has been also studied. The effectiveness of the adaptation approach using pre-stored statistics for each utterance was confirmed through the recognition experiments where the accuracy of the model trained by the adaptation is almost equivalent to the pooled EM algorithm.
|