2014 Fiscal Year Final Research Report
Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours
Project/Area Number |
24300068
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | The University of Tokyo |
Principal Investigator |
HIROSE Keikichi 東京大学, 情報理工学(系)研究科, 教授 (50111472)
|
Co-Investigator(Kenkyū-buntansha) |
MINEMATSU Nobuaki 東京大学, 大学院工学系研究科, 教授 (90273333)
SAITO Daisuke 東京大学, 大学院工学系研究科, 助教 (40615150)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Keywords | 基本周波数パターン / 生成過程モデル / 統計的音声合成 / 韻律制御 / 音声変換 / 談話の焦点 / マルチストリーム学習 / 行列変量GMM |
Outline of Final Research Achievements |
Research works were conducted with the aim of realizing flexible control of prosody and better speech quality in statistical-based speech synthesis by applying constraints of the generation process model of fundamental frequency (F0) contours. Several methods were developed including one to use F0 contours approximated by the model for HMM training. In the method, hierarchical F0 contours based on the model were treated separately by the multi-stream scheme, leading to a better prosody control keeping clear relations with linguistic information. Lexical emphasis was realized by manipulating the model commands (prosody conversion). Better speaker conversions were realized in multi-speaker case through matrix-variate Gaussian mixture model and deep neural network with speaker-dependent sub-networks. Research works were conducted also for Chinese, with preliminary experiments on speech translation.
|
Free Research Field |
音声言語情報処理
|