Hands free speech recognition method based on auditory characteristics
Project/Area Number |
15500106
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Shinshu University |
Principal Investigator |
MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)
|
Co-Investigator(Kenkyū-buntansha) |
YAMAMOTO Kazumasa Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
|
Project Period (FY) |
2003 – 2004
|
Project Status |
Completed (Fiscal Year 2004)
|
Budget Amount *help |
¥3,700,000 (Direct Cost: ¥3,700,000)
Fiscal Year 2004: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2003: ¥2,300,000 (Direct Cost: ¥2,300,000)
|
Keywords | Hands free speech recognition / Aurora-2 database / Generalized logarithmic scale / Mel-LPC analysis / Wiener filter / Dereverberation / Distant speech recognition / Forward Masking / 実環境音声認識 / 隠れマルコフモデル / 動的ケプストラム / 順行マスキング / 音節モデル / 音節連鎖モデル |
Research Abstract |
Firstly, we proposed a forward masking of Mel-LPC based spectrum on the generalized logarithmic scale. Besides, the variance normalization and a mashing control with the estimated SNR are examined for improving noise robustness. The experimental results on the Aurora-2 database showed that Mel-LPC based cepstrum on generalized log-scale with cepstrum mean and variance normalization for γ=0.1 provides the best performance over the normalized forward masking parameter under any condition. Secondly, We developed a frequency warped Wiener filter to enhance Mel-LPC spectra in presence of additive noise. The proposed filter is directly estimated from the signal on the linear frequency scale and then is efficiently implemented in the autocorrelation domain without denoising input speech. As a result of evaluation using Aurora 2 database, the optimum filter order is shown to be comparable to that of Mel-LPC analysis, and thus filtering is computationally inexpensive. Word accuracy is improved by about 20% at most with the proposed Wiener filter. Thirdly, in order to reduce the influence of reverberation, we examined a reverberation model on the power trajectory domain at the output of a mel-filter in the MFCC analysis. The model parameters consists of the decay rate representing reverberation, the ratio of reverberant power to the direct sound, and the frequency response of the channel including some parts of coloration. Recognition experiments show that the dereverberation method based on this model attains about 10% improvement in Ace. compared to non-processed conditions.
|
Report
(3 results)
Research Products
(12 results)