Project/Area Number |
05680294
|
Research Category |
Grant-in-Aid for General Scientific Research (C)
|
Allocation Type | Single-year Grants |
Research Field |
Intelligent informatics
|
Research Institution | Shinshu University |
Principal Investigator |
MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部・電気電子工学科, 教授 (60005452)
|
Project Period (FY) |
1993 – 1994
|
Project Status |
Completed (Fiscal Year 1994)
|
Budget Amount *help |
¥1,800,000 (Direct Cost: ¥1,800,000)
Fiscal Year 1994: ¥300,000 (Direct Cost: ¥300,000)
Fiscal Year 1993: ¥1,500,000 (Direct Cost: ¥1,500,000)
|
Keywords | Speech Recognition / Hidden Markov Model / Noisy Environments / Frequency Weighting / Euclidean Distance / Noise Robustness |
Research Abstract |
In order to realize robust continuous density Hidden Markov Models (HMM) for noisy speech recognition, this study develops a frequency-weighted HMM based on the human auditory characteristics which is seseitive to formant peaks in high SNR frequency region. In this HMM,the covariance matrices of Gaussian probability density functions are fixed to the inverse of frequency weighting matrices in order to utilize the robustness of group delay spectra and also to incorporate their relative perceptual importance in frequency domain into HMM.Several frequency weighting functions and the scaling methods of frequency weighting matrices are examined using the international data base of NOISEX-92. The results of word recognition tests are summarized as follows. (1) The smoothed power spectrum derived from each mean vector gives the most robust HMM. (2) The optimum scaling to convert the weighting matrices to the covariance matrices is such that the sum of weighting coefficients is equal to one or the determinants of the converted covariances are 50 to 150 times larger than those of initial HMMs. (3) A larger number of states is required to attain the robustness in the frequency-weighted HMM. (4) Adaptive preemphasis improves the robustness to noises which have less energy in the high frequency region. (5) The frequency-weighted HMM attains SNR gains of 6 to 12 dB over a standard diagonal HMM for white, pink, and car noises. (6) Even when preprocessing the noisy speech by the standard noise reduction method of spectral subtraction, the frequency weighted HMM attains about 10% higher recognition scores in very low SNR condition than the conventional HMM.
|