• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of robust acoustic model for hands-free speech recognition

Research Project

Project/Area Number 12680376
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionShinshu University

Principal Investigator

MATSUMOTO Hiroshi  Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)

Co-Investigator(Kenkyū-buntansha) YAMAMOTO Kazumasa  Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
Project Period (FY) 2000 – 2001
Project Status Completed (Fiscal Year 2001)
Budget Amount *help
¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2001: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 2000: ¥2,700,000 (Direct Cost: ¥2,700,000)
KeywordsHands-free / Speech recognition / Additive Noise / Reverberation / Convolutional Noise / Dynamic Cepstrum / MLLR / Mel-LPC / 遠隔音声 / 残響音声 / HMM合成 / 雑音HMM
Research Abstract

( 1 ) A Study on Robust Acoustic Parameters for Hands-free speech recognition
In hands-free speech recognition, the variation of additive and convolutional noises due to the variable distance between speaker and microphone as well as reverberation extremely degrades recognition performance. In order to improve the robustness to these disturbances, this project examined a new feature parameter, "a generalized dynamic cepstrum ( DyMFGC ) ," based on the forward masking on the generalized logarithmic scale between the logarithmic and the linear scales. First, the forward masking proposed is applied to a mel-frequency filter bank spectra. Furthermore, this forward masking was applied to a mel-LPC spectra, which is derived by a simple and efficient time domain technique to estimate an all-poll model on a mel-frequency axis.
Digit recognition tests are carried out under the conditions that the distance between speaker and microphone is from 20 to 200cm in a relatively quiet and small size offi … More ce environments. Under white noise environments, the DyMFGC outperforms the dynamic cepstrum on the logarithmic spectrum and MFCC with cepstral mean normalization, and maintains the word accuracy of 90 % to 95 % within a 1m distance from a source.
(2) A Study on a HMM Adaptation to Hands-free Speech
A part of this project also developed a Maximum Likelihood Linear Regression ( MLLR ) technique based on a singular value decomposition ( SVD ) and an effective rank estimation. This technique allows us to apply it to any size of regression classes and also to extend the second order regression. A preliminary test by speaker adaptation shows that the SVD-based MLLR achieves slightly higher recognition accuracy than the conventional MLLR in large vocabulary speech recognition. Furthermore, the second order regression improves adaptation accuracy for additive noise conditions.
In another study, we extended the parallel model combination ( PMC ) to the segmental unit input HMM to adapt it to speech degraded by additive noise and/or reverberant environments. This method gives better recognition performance than the original PMC in the additive noise environments, but is not so effective to the reverberant environments. Less

Report

(3 results)
  • 2001 Annual Research Report   Final Research Report Summary
  • 2000 Annual Research Report
  • Research Products

    (22 results)

All Other

All Publications (22 results)

  • [Publications] Y.Ito, H.Matsumoto, K.Yamamoto: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc. of ICSLP2000. Vol.III. 530-533 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] M.Moroto, H.Matsumoto: "Evaluation of Mel-LPC analysis by a large vocabulary Japanese dictation system"Proc. of WESTPRAC-VII. Vol.I. 93-96 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. Vol.1. 115-118 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. Vol.1. 183-186 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] H.Matsumoto, M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of ICASSP2001. Vol.1. 117-120 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] H.Matsumoto, et al.: "Evaluation of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech. Vol.2. 881-884 (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Ito,Y., Matsumoto,H., Yamamoto,K.: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc. of ICSLP. III. 530-533 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Moroto,M., Matsumoto,H.: "Evaluation of Mel-LPC analysis by a large vocabulary Japanese dictation system"Proc. WESTPRAC-VII. 93-96 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Yamamoto,K., Nakagawa,S.: "Difference in speech recognition performance caused by difference in front-end devices and its compensation"Proc. WESTPRAC-VII. 85-88 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Matsumoto,H., et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. 115-118 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Yamamoto,K., et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. 183-186 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Matsumoto,H. and Moroto,M.: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of ICASSP. Vol. I. 117-120 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Matsumoto,H. et al: "Evaluatioii of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech. 2. 881-884 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc. of HSC Workshop. Vol.1. 881-884 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc. of HSC Workshop. Vol.1. 183-186 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] H.Matsumoto, M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc. of International Conference on Acoustics, Speech Signal Processing. Vol.I. 117-120 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] H.Matsumoto, et al.: "Evaluation of a generalized Dynamic Cepstrum in distant speech recognition"Proc. of Eurospeech 2001. Vol.2. 881-884 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] Y.Itoh,H.Matsumoto and K.Yamamoto: "Forward masking on a generalized logarithmic scale for robust speech recognition"Proc.of International Conference on Spoken Language Processing. Vol.III. 530-533 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] N.Moroto and H.Matsumoto: "Evaluation of Mel-LPC analysis by a large Vocabulary Japanese dictation"Proc.of West PRRAC VII. Vol.2. 93-96 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] H.Matsumoto, et al.: "A generalized Dynamic Cepstrum for hands-free speech recognition"Proc.of HSC Workshop. Vol.1. (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] K.Yamamoto, et al.: "Evaluation of PMC for segmental unit input HMM in various environments"Proc.of HSC Workshop. Vol.1. (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] H.Matsumoto and M.Moroto: "Evaluation of Mel-LPC cepstrum in a large vocabulary continuous speech recognition"Proc.of International Conference on Acoustics, Speech Signal Processing. Vol.I. (2001)

    • Related Report
      2000 Annual Research Report

URL: 

Published: 2000-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi