2015 Fiscal Year Final Research Report

Speech information processing using deep generative models and their factorization

Research Project

Project/Area Number	25280058
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Partial Multi-year Fund
Section	一般
Research Field	Perceptual information processing
Research Institution	Tokyo Institute of Technology
Principal Investigator	Shinoda Koichi 東京工業大学, 情報理工学(系)研究科, 教授 (10343097)
Co-Investigator(Kenkyū-buntansha)	IWANO Koji 東京都市大学, メディア学部, 教授 (90323823) SHINOZAKI Takahiro 東京工業大学, 大学院総合理工学研究科, 准教授 (80447903)
Project Period (FY)	2013-04-01 – 2016-03-31
Keywords	音声情報処理 / 深層学習 / 話者適応
Outline of Final Research Achievements	In speech recognition, it is important to train an accurate deep neural network (DNN) acoustic model from a large amount speech data from many speakers. In this study, we developed a framework to improve accuracy of the DNN acoustic model by factorizing speech data into phoneme and speaker elements. First we developed a speaker recognition method using deep Siamese network in which two DNNs which share its part. Second, we applied a DNN with a hierarchical phonetic structure to speaker adaptation. Third, we developed a speaker-adaptive training method where we utilized a student-teacher learning framework using soft targets. We improved speaker verification and speech recognition performance. We also studied DNN implementation and DNN structure design.
Free Research Field	音声情報処理