2015 Fiscal Year Final Research Report
Automatic classification of speech and audio signals using large-scale corpus and its application to speech recognition
Project/Area Number |
25330183
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | Yamagata University |
Principal Investigator |
Kosaka Tetsuo 山形大学, 理工学研究科, 教授 (50359569)
|
Co-Investigator(Renkei-kenkyūsha) |
KATO Masaharu 山形大学, 大学院理工学研究科, 助教 (10250953)
|
Project Period (FY) |
2013-04-01 – 2016-03-31
|
Keywords | 音声認識 / 音響モデル / クラスタリング / 隠れマルコフモデル / ディープニューラルネットワーク |
Outline of Final Research Achievements |
Nowadays, due to the expansion of speech corpus and advancement of computational performance, performance of speech recognition is improving. However, speech and audio signals are highly variable in terms of their features such as speaker characteristics and background noise. This variability sometimes causes the degradation of recognition performance. In this study, we investigate this problem by using clustering techniques. We attempt to improve recognition performance by using class models trained with categorized data based on acoustic features. The training of models was carried out using the large-scale Japanese speech corpus. In this study, we utilize not only Gaussian mixture models (GMMs) but also deep neural networks (DNNs) as acoustic models.
|
Free Research Field |
音声情報処理
|