• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Hands free speech recognition method based on auditory characteristics

Research Project

Project/Area Number 15500106
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionShinshu University

Principal Investigator

MATSUMOTO Hiroshi  Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)

Co-Investigator(Kenkyū-buntansha) YAMAMOTO Kazumasa  Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
Project Period (FY) 2003 – 2004
Project Status Completed (Fiscal Year 2004)
Budget Amount *help
¥3,700,000 (Direct Cost: ¥3,700,000)
Fiscal Year 2004: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2003: ¥2,300,000 (Direct Cost: ¥2,300,000)
KeywordsHands free speech recognition / Aurora-2 database / Generalized logarithmic scale / Mel-LPC analysis / Wiener filter / Dereverberation / Distant speech recognition / Forward Masking / 実環境音声認識 / 隠れマルコフモデル / 動的ケプストラム / 順行マスキング / 音節モデル / 音節連鎖モデル
Research Abstract

Firstly, we proposed a forward masking of Mel-LPC based spectrum on the generalized logarithmic scale. Besides, the variance normalization and a mashing control with the estimated SNR are examined for improving noise robustness.
The experimental results on the Aurora-2 database showed that Mel-LPC based cepstrum on generalized log-scale with cepstrum mean and variance normalization for γ=0.1 provides the best performance over the normalized forward masking parameter under any condition.
Secondly, We developed a frequency warped Wiener filter to enhance Mel-LPC spectra in presence of additive noise. The proposed filter is directly estimated from the signal on the linear frequency scale and then is efficiently implemented in the autocorrelation domain without denoising input speech. As a result of evaluation using Aurora 2 database, the optimum filter order is shown to be comparable to that of Mel-LPC analysis, and thus filtering is computationally inexpensive. Word accuracy is improved by about 20% at most with the proposed Wiener filter.
Thirdly, in order to reduce the influence of reverberation, we examined a reverberation model on the power trajectory domain at the output of a mel-filter in the MFCC analysis. The model parameters consists of the decay rate representing reverberation, the ratio of reverberant power to the direct sound, and the frequency response of the channel including some parts of coloration. Recognition experiments show that the dereverberation method based on this model attains about 10% improvement in Ace. compared to non-processed conditions.

Report

(3 results)
  • 2004 Annual Research Report   Final Research Report Summary
  • 2003 Annual Research Report
  • Research Products

    (12 results)

All 2005 2004 Other

All Journal Article (10 results) Publications (2 results)

  • [Journal Article] Reverberation modeling on power spectral trajectory for distant speech recogntion2005

    • Author(s)
      H.Matsumoto, T.Takei, K.Yamamoto
    • Journal Title

      Proc.of 2005 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA05)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005

    • Author(s)
      Md.Babul Islam, H.Matsumoto, K.Yamamoto
    • Journal Title

      Proc.of International Workshop on Nonlinear Signal and Image Processing (NSIP2005)

    • NAID

      10018036975

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Reverberation modeling on power spectral trajectory for distant speech recogntion2005

    • Author(s)
      Matsumoto, T.Takei, K.Yamamoto
    • Journal Title

      Proc.of 2005 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA05)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005

    • Author(s)
      Md.Babul Islam, H.Matsumoto, K.Yamamoto
    • Journal Title

      Proc.of International Workshop on Nonlinear Signal and Image Processing (NSIP2005) 19PM2D-1

    • NAID

      10018036975

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Reverberation modeling on power spectral trajectory for distant Speech recognition2005

    • Author(s)
      H.Matsumoto, T.Takei, K Yamamoto
    • Journal Title

      Proc.Of 2005 Joint Workshop on Hands-free Speech Communication and Microphone arrays (HSCMA05)

    • NAID

      10018037278

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005

    • Author(s)
      Md.Babul Islam, H.Matsumoto, K Yamamoto
    • Journal Title

      Proc.of International Workshop on Nonlinear Signal and Image Processing (NSIP2005) (5月発表予定)

    • NAID

      10018036975

    • Related Report
      2004 Annual Research Report
  • [Journal Article] Improved forward masking on a generalized logarithmic scale for robust speech recognition2004

    • Author(s)
      H.Matsumoto, T.Ichikawa, K.Yamamoto
    • Journal Title

      Proc.of 18th International Congress on Acoustics

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Syllable-connected models for Japanese speech recognition2004

    • Author(s)
      K.Yamamoto, T.Ikeda, H.Matsumoto, et al.
    • Journal Title

      Proc.of 18th International Congress on Acoustics

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Improved forward masking on a generalized logarithmic scale for robust speech recognition2004

    • Author(s)
      H.Matsumoto, T.Ichikawa, K.Yamamoto
    • Journal Title

      Proc.of 18th International Congress on Acoustics Th4.H.4

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Syllable-connected models for Japanese speech recognition2004

    • Author(s)
      K.Yamamoto, T.Ikeda, H.Matsumoto, et al.
    • Journal Title

      Proc.of 18th International Congress on Acoustics Fr2.H.2

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Publications] H.Matsumoto, T.Ichikawa, K.Yamamoto: "Improved forward masking on a generalized logarithmic scale for robust speech recognition"Proc.of 18^<th> International Congress on Acoustics. (発表予定). (2004)

    • Related Report
      2003 Annual Research Report
  • [Publications] K.Yamamoto, T.Ikeda, H.Matsumoto, et al.: "Syllable-connected models for Japanese speech recognition"Proc.of 18^<th> International Congress on Acoustics. (発表予定). (2004)

    • Related Report
      2003 Annual Research Report

URL: 

Published: 2003-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi