2014 Fiscal Year Final Research Report
Improvement of speech recognition performance by using phase information with long analysis window
Project/Area Number |
24500201
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Toyohashi University of Technology (2012, 2014) Toyota National College of Technology (2013) |
Principal Investigator |
YAMAMOTO Kazumasa 豊橋技術科学大学, 工学(系)研究科(研究院), 准教授 (40324230)
|
Co-Investigator(Kenkyū-buntansha) |
NAKAGAWA Seiichi 豊橋技術科学大学, リーディング大学院教育推進機構, 特任教授 (20115893)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Keywords | 音声認識 / 音響モデル / 音響特徴量 / 位相スペクトル / 群遅延スペクトル / 分析窓 / 雑音環境 / ディープニューラルネットワーク |
Outline of Final Research Achievements |
In traditional speech recognition techniques, amplitude spectrum based features (typically MFCC or PLP) are usually used as acoustic features, while phase spectrum based features are almost ignored. In this research, we showed that the phase spectrum based features, which extracted as group delay spectrum based cepstrum features by using the longer (100-200ms) analysis window then usual one (25ms), can be used for speech recognition as the same as the amplitude spectrum based features and we can improve speech recognition performance by using the both features simultaneously. We also studied about deep learning based acoustic models for robust speech recognition in this research. We modified “noise aware training” method of Deep Neural Network based HMM (DNN-HMM) so that the DNN can treat “enhanced” noisy speech features and noise estimates. We then showed the improvement of noisy speech recognition by using the proposed method.
|
Free Research Field |
音声言語情報処理
|