Hands free speech recognition method based on auditory characteristics

Research Project

Project/Area Number	15500106
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Shinshu University
Principal Investigator	MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)
Co-Investigator(Kenkyū-buntansha)	YAMAMOTO Kazumasa Shinshu University, Faculty of Engineering, Assistant, 工学部, 助手 (40324230)
Project Period (FY)	2003 – 2004
Project Status	Completed (Fiscal Year 2004)
Budget Amount *help	¥3,700,000 (Direct Cost: ¥3,700,000) Fiscal Year 2004: ¥1,400,000 (Direct Cost: ¥1,400,000) Fiscal Year 2003: ¥2,300,000 (Direct Cost: ¥2,300,000)
Keywords	Hands free speech recognition / Aurora-2 database / Generalized logarithmic scale / Mel-LPC analysis / Wiener filter / Dereverberation / Distant speech recognition / Forward Masking / 実環境音声認識 / 隠れマルコフモデル / 動的ケプストラム / 順行マスキング / 音節モデル / 音節連鎖モデル
Research Abstract	Firstly, we proposed a forward masking of Mel-LPC based spectrum on the generalized logarithmic scale. Besides, the variance normalization and a mashing control with the estimated SNR are examined for improving noise robustness. The experimental results on the Aurora-2 database showed that Mel-LPC based cepstrum on generalized log-scale with cepstrum mean and variance normalization for γ=0.1 provides the best performance over the normalized forward masking parameter under any condition. Secondly, We developed a frequency warped Wiener filter to enhance Mel-LPC spectra in presence of additive noise. The proposed filter is directly estimated from the signal on the linear frequency scale and then is efficiently implemented in the autocorrelation domain without denoising input speech. As a result of evaluation using Aurora 2 database, the optimum filter order is shown to be comparable to that of Mel-LPC analysis, and thus filtering is computationally inexpensive. Word accuracy is improved by about 20% at most with the proposed Wiener filter. Thirdly, in order to reduce the influence of reverberation, we examined a reverberation model on the power trajectory domain at the output of a mel-filter in the MFCC analysis. The model parameters consists of the decay rate representing reverberation, the ratio of reverberant power to the direct sound, and the frequency response of the channel including some parts of coloration. Recognition experiments show that the dereverberation method based on this model attains about 10% improvement in Ace. compared to non-processed conditions.

Report

(3 results)

2004 Annual Research Report Final Research Report Summary
2003 Annual Research Report

Research Products
(12 results)

All 2005 2004 Other

All Journal Article (10 results) Publications (2 results)

[Journal Article] Reverberation modeling on power spectral trajectory for distant speech recogntion2005
- Author(s)
  H.Matsumoto, T.Takei, K.Yamamoto
- Journal Title
  
  Proc.of 2005 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA05)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005
- Author(s)
  Md.Babul Islam, H.Matsumoto, K.Yamamoto
- Journal Title
  
  Proc.of International Workshop on Nonlinear Signal and Image Processing (NSIP2005)
- NAID
  10018036975
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Reverberation modeling on power spectral trajectory for distant speech recogntion2005
- Author(s)
  Matsumoto, T.Takei, K.Yamamoto
- Journal Title
  
  Proc.of 2005 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA05)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005
- Author(s)
  Md.Babul Islam, H.Matsumoto, K.Yamamoto
- Journal Title
  
  Proc.of International Workshop on Nonlinear Signal and Image Processing (NSIP2005) 19PM2D-1
- NAID
  10018036975
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Reverberation modeling on power spectral trajectory for distant Speech recognition2005
- Author(s)
  H.Matsumoto, T.Takei, K Yamamoto
- Journal Title
  
  Proc.Of 2005 Joint Workshop on Hands-free Speech Communication and Microphone arrays (HSCMA05)
- NAID
  10018037278
- Related Report
  2004 Annual Research Report
[Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005
- Author(s)
  Md.Babul Islam, H.Matsumoto, K Yamamoto
- Journal Title
  
  Proc.of International Workshop on Nonlinear Signal and Image Processing (NSIP2005) (5月発表予定)
- NAID
  10018036975
- Related Report
  2004 Annual Research Report
[Journal Article] Improved forward masking on a generalized logarithmic scale for robust speech recognition2004
- Author(s)
  H.Matsumoto, T.Ichikawa, K.Yamamoto
- Journal Title
  
  Proc.of 18th International Congress on Acoustics
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Syllable-connected models for Japanese speech recognition2004
- Author(s)
  K.Yamamoto, T.Ikeda, H.Matsumoto, et al.
- Journal Title
  
  Proc.of 18th International Congress on Acoustics
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Improved forward masking on a generalized logarithmic scale for robust speech recognition2004
- Author(s)
  H.Matsumoto, T.Ichikawa, K.Yamamoto
- Journal Title
  
  Proc.of 18th International Congress on Acoustics Th4.H.4
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Journal Article] Syllable-connected models for Japanese speech recognition2004
- Author(s)
  K.Yamamoto, T.Ikeda, H.Matsumoto, et al.
- Journal Title
  
  Proc.of 18th International Congress on Acoustics Fr2.H.2
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2004 Final Research Report Summary
[Publications] H.Matsumoto, T.Ichikawa, K.Yamamoto: "Improved forward masking on a generalized logarithmic scale for robust speech recognition"Proc.of 18^<th> International Congress on Acoustics. (発表予定). (2004)
- Related Report
  2003 Annual Research Report
[Publications] K.Yamamoto, T.Ikeda, H.Matsumoto, et al.: "Syllable-connected models for Japanese speech recognition"Proc.of 18^<th> International Congress on Acoustics. (発表予定). (2004)
- Related Report
  2003 Annual Research Report

Hands free speech recognition method based on auditory characteristics

Principal Investigator

MATSUMOTO Hiroshi Shinshu University, Faculty of Engineering, Professor, 工学部, 教授 (60005452)

¥3,700,000 (Direct Cost: ¥3,700,000)

Report

Research Products

[Journal Article] Reverberation modeling on power spectral trajectory for distant speech recogntion2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Reverberation modeling on power spectral trajectory for distant speech recogntion2005

Author(s)

Journal Title

Description

Related Report

[Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005

Author(s)

Journal Title

NAID

Description

Related Report

[Journal Article] Reverberation modeling on power spectral trajectory for distant Speech recognition2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Frequency Warped Wiener Filtering for Mel-LPC Based Speech Recognition2005

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Improved forward masking on a generalized logarithmic scale for robust speech recognition2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Syllable-connected models for Japanese speech recognition2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Improved forward masking on a generalized logarithmic scale for robust speech recognition2004

Author(s)

Journal Title

Description

Related Report

[Journal Article] Syllable-connected models for Japanese speech recognition2004

Author(s)

Journal Title

Description

Related Report

[Publications] H.Matsumoto, T.Ichikawa, K.Yamamoto: "Improved forward masking on a generalized logarithmic scale for robust speech recognition"Proc.of 18^<th> International Congress on Acoustics. (発表予定). (2004)

Related Report

[Publications] K.Yamamoto, T.Ikeda, H.Matsumoto, et al.: "Syllable-connected models for Japanese speech recognition"Proc.of 18^<th> International Congress on Acoustics. (発表予定). (2004)

Related Report