Development of robust acoustic model for hands-free speech recognition

Research Project

Project/Area Number	12680376
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	Shinshu University
Principal Investigator	MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)
Co-Investigator(Kenkyū-buntansha)	YAMAMOTO Kazumasa Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
Project Period (FY)	2000 – 2001
Project Status	Completed (Fiscal Year 2001)
Budget Amount *help	¥3,600,000 (Direct Cost: ¥3,600,000) Fiscal Year 2001: ¥900,000 (Direct Cost: ¥900,000) Fiscal Year 2000: ¥2,700,000 (Direct Cost: ¥2,700,000)
Keywords	Hands-free / Speech recognition / Additive Noise / Reverberation / Convolutional Noise / Dynamic Cepstrum / MLLR / Mel-LPC / 遠隔音声 / 残響音声 / HMM合成 / 雑音HMM
Research Abstract	( 1 ) A Study on Robust Acoustic Parameters for Hands-free speech recognition In hands-free speech recognition, the variation of additive and convolutional noises due to the variable distance between speaker and microphone as well as reverberation extremely degrades recognition performance. In order to improve the robustness to these disturbances, this project examined a new feature parameter, "a generalized dynamic cepstrum ( DyMFGC ) ," based on the forward masking on the generalized logarithmic scale between the logarithmic and the linear scales. First, the forward masking proposed is applied to a mel-frequency filter bank spectra. Furthermore, this forward masking was applied to a mel-LPC spectra, which is derived by a simple and efficient time domain technique to estimate an all-poll model on a mel-frequency axis. Digit recognition tests are carried out under the conditions that the distance between speaker and microphone is from 20 to 200cm in a relatively quiet and small size offi … More ce environments. Under white noise environments, the DyMFGC outperforms the dynamic cepstrum on the logarithmic spectrum and MFCC with cepstral mean normalization, and maintains the word accuracy of 90 % to 95 % within a 1m distance from a source. (2) A Study on a HMM Adaptation to Hands-free Speech A part of this project also developed a Maximum Likelihood Linear Regression ( MLLR ) technique based on a singular value decomposition ( SVD ) and an effective rank estimation. This technique allows us to apply it to any size of regression classes and also to extend the second order regression. A preliminary test by speaker adaptation shows that the SVD-based MLLR achieves slightly higher recognition accuracy than the conventional MLLR in large vocabulary speech recognition. Furthermore, the second order regression improves adaptation accuracy for additive noise conditions. In another study, we extended the parallel model combination ( PMC ) to the segmental unit input HMM to adapt it to speech degraded by additive noise and/or reverberant environments. This method gives better recognition performance than the original PMC in the additive noise environments, but is not so effective to the reverberant environments. Less

Report

(3 results)

2001 Annual Research Report Final Research Report Summary
2000 Annual Research Report

Research Products
(22 results)

All Other

All Publications (22 results)

[Publications] Y.Ito, H.Matsumoto, K.Yamamoto: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc. of ICSLP2000. Vol.III. 530-533 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] M.Moroto, H.Matsumoto: "Evaluation of Mel-LPC analysis by a large vocabulary Japanese dictation system"Proc. of WESTPRAC-VII. Vol.I. 93-96 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. Vol.1. 115-118 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. Vol.1. 183-186 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] H.Matsumoto, M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of ICASSP2001. Vol.1. 117-120 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] H.Matsumoto, et al.: "Evaluation of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech. Vol.2. 881-884 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Ito,Y., Matsumoto,H., Yamamoto,K.: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc. of ICSLP. III. 530-533 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Moroto,M., Matsumoto,H.: "Evaluation of Mel-LPC analysis by a large vocabulary Japanese dictation system"Proc. WESTPRAC-VII. 93-96 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Yamamoto,K., Nakagawa,S.: "Difference in speech recognition performance caused by difference in front-end devices and its compensation"Proc. WESTPRAC-VII. 85-88 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Matsumoto,H., et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. 115-118 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Yamamoto,K., et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. 183-186 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Matsumoto,H. and Moroto,M.: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of ICASSP. Vol. I. 117-120 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Matsumoto,H. et al: "Evaluatioii of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech. 2. 881-884 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. Vol.1. 881-884 (2001)
- Related Report
  2001 Annual Research Report
[Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. Vol.1. 183-186 (2001)
- Related Report
  2001 Annual Research Report
[Publications] H.Matsumoto, M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of International Conference on Acoustics, Speech Signal Processing. Vol.I. 117-120 (2001)
- Related Report
  2001 Annual Research Report
[Publications] H.Matsumoto, et al.: "Evaluation of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech 2001. Vol.2. 881-884 (2001)
- Related Report
  2001 Annual Research Report
[Publications] Y.Itoh,H.Matsumoto and K.Yamamoto: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc.of International Conference on Spoken Language Processing. Vol.III. 530-533 (2000)
- Related Report
  2000 Annual Research Report
[Publications] N.Moroto and H.Matsumoto: "Evaluation of Mel-LPC analysis by a large Vocabulary Japanese dictation"Proc.of West PRRAC VII. Vol.2. 93-96 (2000)
- Related Report
  2000 Annual Research Report
[Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc.of HSC Workshop. Vol.1. (2001)
- Related Report
  2000 Annual Research Report
[Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc.of HSC Workshop. Vol.1. (2001)
- Related Report
  2000 Annual Research Report
[Publications] H.Matsumoto and M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc.of International Conference on Acoustics, Speech Signal Processing. Vol.I. (2001)
- Related Report
  2000 Annual Research Report

Development of robust acoustic model for hands-free speech recognition

Principal Investigator

MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)

¥3,600,000 (Direct Cost: ¥3,600,000)

Report

Research Products

[Publications] Y.Ito, H.Matsumoto, K.Yamamoto: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc. of ICSLP2000. Vol.III. 530-533 (2000)

Description

Related Report

[Publications] M.Moroto, H.Matsumoto: "Evaluation of Mel-LPC analysis by a large vocabulary Japanese dictation system"Proc. of WESTPRAC-VII. Vol.I. 93-96 (2000)

Description

Related Report

[Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. Vol.1. 115-118 (2001)

Description

Related Report

[Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. Vol.1. 183-186 (2001)

Description

Related Report

[Publications] H.Matsumoto, M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of ICASSP2001. Vol.1. 117-120 (2001)

Description

Related Report

[Publications] H.Matsumoto, et al.: "Evaluation of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech. Vol.2. 881-884 (2001)

Description

Related Report

[Publications] Ito,Y., Matsumoto,H., Yamamoto,K.: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc. of ICSLP. III. 530-533 (2000)

Description

Related Report

[Publications] Moroto,M., Matsumoto,H.: "Evaluation of Mel-LPC analysis by a large vocabulary Japanese dictation system"Proc. WESTPRAC-VII. 93-96 (2000)

Description

Related Report

[Publications] Yamamoto,K., Nakagawa,S.: "Difference in speech recognition performance caused by difference in front-end devices and its compensation"Proc. WESTPRAC-VII. 85-88 (2000)

Description

Related Report

[Publications] Matsumoto,H., et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. 115-118 (2001)

Description

Related Report

[Publications] Yamamoto,K., et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. 183-186 (2001)

Description

Related Report

[Publications] Matsumoto,H. and Moroto,M.: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of ICASSP. Vol. I. 117-120 (2001)

Description

Related Report

[Publications] Matsumoto,H. et al: "Evaluatioii of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech. 2. 881-884 (2001)

Description

Related Report

[Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. Vol.1. 881-884 (2001)

Related Report

[Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. Vol.1. 183-186 (2001)

Related Report

[Publications] H.Matsumoto, M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of International Conference on Acoustics, Speech Signal Processing. Vol.I. 117-120 (2001)

Related Report

[Publications] H.Matsumoto, et al.: "Evaluation of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech 2001. Vol.2. 881-884 (2001)

Related Report

[Publications] Y.Itoh,H.Matsumoto and K.Yamamoto: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc.of International Conference on Spoken Language Processing. Vol.III. 530-533 (2000)

Related Report

[Publications] N.Moroto and H.Matsumoto: "Evaluation of Mel-LPC analysis by a large Vocabulary Japanese dictation"Proc.of West PRRAC VII. Vol.2. 93-96 (2000)

Related Report

[Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc.of HSC Workshop. Vol.1. (2001)

Related Report

[Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc.of HSC Workshop. Vol.1. (2001)

Related Report

[Publications] H.Matsumoto and M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc.of International Conference on Acoustics, Speech Signal Processing. Vol.I. (2001)

Related Report