Development of robust acoustic model for hands-free speech recognition
Project/Area Number |
12680376
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Shinshu University |
Principal Investigator |
MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)
|
Co-Investigator(Kenkyū-buntansha) |
YAMAMOTO Kazumasa Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
|
Project Period (FY) |
2000 – 2001
|
Project Status |
Completed (Fiscal Year 2001)
|
Budget Amount *help |
¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2001: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 2000: ¥2,700,000 (Direct Cost: ¥2,700,000)
|
Keywords | Hands-free / Speech recognition / Additive Noise / Reverberation / Convolutional Noise / Dynamic Cepstrum / MLLR / Mel-LPC / 遠隔音声 / 残響音声 / HMM合成 / 雑音HMM |
Research Abstract |
( 1 ) A Study on Robust Acoustic Parameters for Hands-free speech recognition In hands-free speech recognition, the variation of additive and convolutional noises due to the variable distance between speaker and microphone as well as reverberation extremely degrades recognition performance. In order to improve the robustness to these disturbances, this project examined a new feature parameter, "a generalized dynamic cepstrum ( DyMFGC ) ," based on the forward masking on the generalized logarithmic scale between the logarithmic and the linear scales. First, the forward masking proposed is applied to a mel-frequency filter bank spectra. Furthermore, this forward masking was applied to a mel-LPC spectra, which is derived by a simple and efficient time domain technique to estimate an all-poll model on a mel-frequency axis. Digit recognition tests are carried out under the conditions that the distance between speaker and microphone is from 20 to 200cm in a relatively quiet and small size offi
… More
ce environments. Under white noise environments, the DyMFGC outperforms the dynamic cepstrum on the logarithmic spectrum and MFCC with cepstral mean normalization, and maintains the word accuracy of 90 % to 95 % within a 1m distance from a source. (2) A Study on a HMM Adaptation to Hands-free Speech A part of this project also developed a Maximum Likelihood Linear Regression ( MLLR ) technique based on a singular value decomposition ( SVD ) and an effective rank estimation. This technique allows us to apply it to any size of regression classes and also to extend the second order regression. A preliminary test by speaker adaptation shows that the SVD-based MLLR achieves slightly higher recognition accuracy than the conventional MLLR in large vocabulary speech recognition. Furthermore, the second order regression improves adaptation accuracy for additive noise conditions. In another study, we extended the parallel model combination ( PMC ) to the segmental unit input HMM to adapt it to speech degraded by additive noise and/or reverberant environments. This method gives better recognition performance than the original PMC in the additive noise environments, but is not so effective to the reverberant environments. Less
|
Report
(3 results)
Research Products
(22 results)