Automatic classification of speech and audio signals using large-scale corpus and its application to speech recognition
Project/Area Number |
25330183
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | Yamagata University |
Principal Investigator |
Kosaka Tetsuo 山形大学, 理工学研究科, 教授 (50359569)
|
Co-Investigator(Renkei-kenkyūsha) |
KATO Masaharu 山形大学, 大学院理工学研究科, 助教 (10250953)
|
Project Period (FY) |
2013-04-01 – 2016-03-31
|
Project Status |
Completed (Fiscal Year 2015)
|
Budget Amount *help |
¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2015: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2014: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2013: ¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
|
Keywords | 音声認識 / 音響モデル / クラスタリング / 隠れマルコフモデル / ディープニューラルネットワーク / ディープニューラルネット / 話者適応 / 話者 / 音声コーパス |
Outline of Final Research Achievements |
Nowadays, due to the expansion of speech corpus and advancement of computational performance, performance of speech recognition is improving. However, speech and audio signals are highly variable in terms of their features such as speaker characteristics and background noise. This variability sometimes causes the degradation of recognition performance. In this study, we investigate this problem by using clustering techniques. We attempt to improve recognition performance by using class models trained with categorized data based on acoustic features. The training of models was carried out using the large-scale Japanese speech corpus. In this study, we utilize not only Gaussian mixture models (GMMs) but also deep neural networks (DNNs) as acoustic models.
|
Report
(4 results)
Research Products
(18 results)