2015 Fiscal Year Final Research Report

Automatic classification of speech and audio signals using large-scale corpus and its application to speech recognition

Research Project

Project/Area Number	25330183
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Yamagata University
Principal Investigator	Kosaka Tetsuo 山形大学, 理工学研究科, 教授 (50359569)
Co-Investigator(Renkei-kenkyūsha)	KATO Masaharu 山形大学, 大学院理工学研究科, 助教 (10250953)
Project Period (FY)	2013-04-01 – 2016-03-31
Keywords	音声認識 / 音響モデル / クラスタリング / 隠れマルコフモデル / ディープニューラルネットワーク
Outline of Final Research Achievements	Nowadays, due to the expansion of speech corpus and advancement of computational performance, performance of speech recognition is improving. However, speech and audio signals are highly variable in terms of their features such as speaker characteristics and background noise. This variability sometimes causes the degradation of recognition performance. In this study, we investigate this problem by using clustering techniques. We attempt to improve recognition performance by using class models trained with categorized data based on acoustic features. The training of models was carried out using the large-scale Japanese speech corpus. In this study, we utilize not only Gaussian mixture models (GMMs) but also deep neural networks (DNNs) as acoustic models.
Free Research Field	音声情報処理