2004 Fiscal Year Final Research Report Summary
Hands free speech recognition method based on auditory characteristics
Project/Area Number |
15500106
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Shinshu University |
Principal Investigator |
MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)
|
Co-Investigator(Kenkyū-buntansha) |
YAMAMOTO Kazumasa Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
|
Project Period (FY) |
2003 – 2004
|
Keywords | Hands free speech recognition / Aurora-2 database / Generalized logarithmic scale / Mel-LPC analysis / Wiener filter / Dereverberation / Distant speech recognition / Forward Masking |
Research Abstract |
Firstly, we proposed a forward masking of Mel-LPC based spectrum on the generalized logarithmic scale. Besides, the variance normalization and a mashing control with the estimated SNR are examined for improving noise robustness. The experimental results on the Aurora-2 database showed that Mel-LPC based cepstrum on generalized log-scale with cepstrum mean and variance normalization for γ=0.1 provides the best performance over the normalized forward masking parameter under any condition. Secondly, We developed a frequency warped Wiener filter to enhance Mel-LPC spectra in presence of additive noise. The proposed filter is directly estimated from the signal on the linear frequency scale and then is efficiently implemented in the autocorrelation domain without denoising input speech. As a result of evaluation using Aurora 2 database, the optimum filter order is shown to be comparable to that of Mel-LPC analysis, and thus filtering is computationally inexpensive. Word accuracy is improved by about 20% at most with the proposed Wiener filter. Thirdly, in order to reduce the influence of reverberation, we examined a reverberation model on the power trajectory domain at the output of a mel-filter in the MFCC analysis. The model parameters consists of the decay rate representing reverberation, the ratio of reverberant power to the direct sound, and the frequency response of the channel including some parts of coloration. Recognition experiments show that the dereverberation method based on this model attains about 10% improvement in Ace. compared to non-processed conditions.
|