2017 Fiscal Year Final Research Report

Accurate speech recognition system with deep neural network introducing human auditory characteristic in real environments

Research Project

PDF

Project/Area Number	15K00233
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Chubu University (2017) Toyohashi University of Technology (2015-2016)
Principal Investigator	YAMAMOTO Kazumasa 中部大学, 工学部, 准教授 (40324230)
Co-Investigator(Kenkyū-buntansha)	中川聖一豊橋技術科学大学, リーディング大学院教育推進機構, 特命教授 (20115893)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	音声認識 / 深層学習 / Deep Neural Network / 聴覚特性 / 音響特徴量 / フィルタバンク
Outline of Final Research Achievements	Currently, deep learning has been introduced into speech recognition technology and the speech recognition technology is gradually being used practically, but speech recognition performance is still not sufficient in noisy environments or for distant-talking. The purpose of this research is to improve speech recognition accuracy by combining DNN (Deep Neural Network) acoustic model with human auditory characteristics. In this research, we proposed a method to automatically learn feature extraction filterbanks at the bottom of DNN acoustic model by using deep learning considering human auditory characteristics. By using this method, improvement of speech recognition accuracy was obtained for speaker-independent speech recognition. In addition, the proposed method improved speaker-adapted speech recognition accuracy even under the condition that the amount of adaptation data is small. The results showed the effectiveness of the proposed method.
Free Research Field	音声情報処理