2015 Fiscal Year Final Research Report
Speech information processing using deep generative models and their factorization
Project/Area Number |
25280058
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
Shinoda Koichi 東京工業大学, 情報理工学(系)研究科, 教授 (10343097)
|
Co-Investigator(Kenkyū-buntansha) |
IWANO Koji 東京都市大学, メディア学部, 教授 (90323823)
SHINOZAKI Takahiro 東京工業大学, 大学院総合理工学研究科, 准教授 (80447903)
|
Project Period (FY) |
2013-04-01 – 2016-03-31
|
Keywords | 音声情報処理 / 深層学習 / 話者適応 |
Outline of Final Research Achievements |
In speech recognition, it is important to train an accurate deep neural network (DNN) acoustic model from a large amount speech data from many speakers. In this study, we developed a framework to improve accuracy of the DNN acoustic model by factorizing speech data into phoneme and speaker elements. First we developed a speaker recognition method using deep Siamese network in which two DNNs which share its part. Second, we applied a DNN with a hierarchical phonetic structure to speaker adaptation. Third, we developed a speaker-adaptive training method where we utilized a student-teacher learning framework using soft targets. We improved speaker verification and speech recognition performance. We also studied DNN implementation and DNN structure design.
|
Free Research Field |
音声情報処理
|