2014 Fiscal Year Final Research Report

Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours

Research Project

PDF

Project/Area Number	24300068
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Partial Multi-year Fund
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	The University of Tokyo
Principal Investigator	HIROSE Keikichi 東京大学, 情報理工学(系)研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	MINEMATSU Nobuaki 東京大学, 大学院工学系研究科, 教授 (90273333) SAITO Daisuke 東京大学, 大学院工学系研究科, 助教 (40615150)
Project Period (FY)	2012-04-01 – 2015-03-31
Keywords	基本周波数パターン / 生成過程モデル / 統計的音声合成 / 韻律制御 / 音声変換 / 談話の焦点 / マルチストリーム学習 / 行列変量GMM
Outline of Final Research Achievements	Research works were conducted with the aim of realizing flexible control of prosody and better speech quality in statistical-based speech synthesis by applying constraints of the generation process model of fundamental frequency (F0) contours. Several methods were developed including one to use F0 contours approximated by the model for HMM training. In the method, hierarchical F0 contours based on the model were treated separately by the multi-stream scheme, leading to a better prosody control keeping clear relations with linguistic information. Lexical emphasis was realized by manipulating the model commands (prosody conversion). Better speaker conversions were realized in multi-speaker case through matrix-variate Gaussian mixture model and deep neural network with speaker-dependent sub-networks. Research works were conducted also for Chinese, with preliminary experiments on speech translation.
Free Research Field	音声言語情報処理