2014 Fiscal Year Final Research Report

Improvement of speech recognition performance by using phase information with long analysis window

Research Project

PDF

Project/Area Number	24500201
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	Toyohashi University of Technology (2012, 2014) Toyota National College of Technology (2013)
Principal Investigator	YAMAMOTO Kazumasa 豊橋技術科学大学, 工学(系)研究科(研究院), 准教授 (40324230)
Co-Investigator(Kenkyū-buntansha)	NAKAGAWA Seiichi 豊橋技術科学大学, リーディング大学院教育推進機構, 特任教授 (20115893)
Project Period (FY)	2012-04-01 – 2015-03-31
Keywords	音声認識 / 音響モデル / 音響特徴量 / 位相スペクトル / 群遅延スペクトル / 分析窓 / 雑音環境 / ディープニューラルネットワーク
Outline of Final Research Achievements	In traditional speech recognition techniques, amplitude spectrum based features (typically MFCC or PLP) are usually used as acoustic features, while phase spectrum based features are almost ignored. In this research, we showed that the phase spectrum based features, which extracted as group delay spectrum based cepstrum features by using the longer (100-200ms) analysis window then usual one (25ms), can be used for speech recognition as the same as the amplitude spectrum based features and we can improve speech recognition performance by using the both features simultaneously. We also studied about deep learning based acoustic models for robust speech recognition in this research. We modified “noise aware training” method of Deep Neural Network based HMM (DNN-HMM) so that the DNN can treat “enhanced” noisy speech features and noise estimates. We then showed the improvement of noisy speech recognition by using the proposed method.
Free Research Field	音声言語情報処理