Improvement of speech recognition performance by using phase information with long analysis window

Research Project

Project/Area Number	24500201
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Toyohashi University of Technology (2012, 2014) Toyota National College of Technology (2013)
Principal Investigator	YAMAMOTO Kazumasa 豊橋技術科学大学, 工学(系)研究科(研究院), 准教授 (40324230)
Co-Investigator(Kenkyū-buntansha)	NAKAGAWA Seiichi 豊橋技術科学大学, リーディング大学院教育推進機構, 特任教授 (20115893)
Project Period (FY)	2012-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥5,330,000 (Direct Cost: ¥4,100,000、Indirect Cost: ¥1,230,000) Fiscal Year 2014: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2013: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2012: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
Keywords	音声認識 / 音響モデル / 音響特徴量 / 位相スペクトル / 群遅延スペクトル / 分析窓 / 雑音環境 / ディープニューラルネットワーク / 長時間分析 / 群遅延 / 深層学習
Outline of Final Research Achievements	In traditional speech recognition techniques, amplitude spectrum based features (typically MFCC or PLP) are usually used as acoustic features, while phase spectrum based features are almost ignored. In this research, we showed that the phase spectrum based features, which extracted as group delay spectrum based cepstrum features by using the longer (100-200ms) analysis window then usual one (25ms), can be used for speech recognition as the same as the amplitude spectrum based features and we can improve speech recognition performance by using the both features simultaneously. We also studied about deep learning based acoustic models for robust speech recognition in this research. We modified “noise aware training” method of Deep Neural Network based HMM (DNN-HMM) so that the DNN can treat “enhanced” noisy speech features and noise estimates. We then showed the improvement of noisy speech recognition by using the proposed method.

Report

(4 results)

2014 Annual Research Report Final Research Report ( PDF )
2013 Research-status Report
2012 Research-status Report

Research Products
(7 results)

All 2015 2014 2013 2012

All Presentation (7 results)

[Presentation] Noise-aware trainingとSSを併用したDNN-HMM音響モデルの雑音下音声認識の評価2015
- Author(s)
  阿部晃大, 山本一公, 中川聖一
- Organizer
  日本音響学会2015年春季研究発表会
- Place of Presentation
  中央大学後楽園キャンパス
- Year and Date
  2015-03-16 – 2015-03-18
- Related Report
  2014 Annual Research Report
[Presentation] Speech recognition based on Itakura-Saito divergence and dynamics / sparseness constraints from mixed sound of speech and music by non-negative matrix factorization2014
- Author(s)
  Naoki Hashimoto, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  INTERSPEECH 2014
- Place of Presentation
  Singapore EXPO（シンガポール）
- Year and Date
  2014-09-15 – 2014-09-18
- Related Report
  2014 Annual Research Report
[Presentation] Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition2014
- Author(s)
  Hiroshi Seki, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  International Conference on Advanced Infomatics: Concepts, Theory and Applications (ICAICTA 2014)
- Place of Presentation
  バンドン工科大学（インドネシア）
- Year and Date
  2014-08-20 – 2014-08-21
- Related Report
  2014 Annual Research Report
[Presentation] Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound2013
- Author(s)
  Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
- Place of Presentation
  Kaohsiung, Taiwan
- Related Report
  2013 Research-status Report
[Presentation] NMF による音楽重畳音声の音声認識の改善2013
- Author(s)
  橋本尚亮, 仲野翔一, 山本一公, 中川聖一
- Organizer
  日本音響学会2013年秋季研究発表会
- Place of Presentation
  豊橋技術科学大学
- Related Report
  2013 Research-status Report
[Presentation] ケプストラム距離に基づくNMFの高速化手法とVQ手法による音楽重畳音声の認識2013
- Author(s)
  仲野翔一, 山本一公, 中川聖一
- Organizer
  日本音響学会2013年春季研究発表会
- Place of Presentation
  東京工科大学
- Related Report
  2012 Research-status Report
[Presentation] Fast NMF based approach and improved VQ based approach for speech recognition from mixed sound2012
- Author(s)
  Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa
- Organizer
  Asia-Pacific Signal and Information Processing Association (APSIPA) Annual Summit and Conference (ASC) 2012
- Place of Presentation
  アメリカ, ハリウッド
- Related Report
  2012 Research-status Report

Improvement of speech recognition performance by using phase information with long analysis window

Principal Investigator

YAMAMOTO Kazumasa 豊橋技術科学大学, 工学(系)研究科(研究院), 准教授 (40324230)

¥5,330,000 (Direct Cost: ¥4,100,000、Indirect Cost: ¥1,230,000)

Report

Research Products

[Presentation] Noise-aware trainingとSSを併用したDNN-HMM音響モデルの雑音下音声認識の評価2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Speech recognition based on Itakura-Saito divergence and dynamics / sparseness constraints from mixed sound of speech and music by non-negative matrix factorization2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Fast NMF based approach and VQ based approach using MFCC distance measure for speech recognition from mixed sound2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] NMF による音楽重畳音声の音声認識の改善2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] ケプストラム距離に基づくNMFの高速化手法とVQ手法による音楽重畳音声の認識2013

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Fast NMF based approach and improved VQ based approach for speech recognition from mixed sound2012

Author(s)

Organizer

Place of Presentation

Related Report