Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours
Project/Area Number |
24300068
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | The University of Tokyo |
Principal Investigator |
HIROSE Keikichi 東京大学, 情報理工学(系)研究科, 教授 (50111472)
|
Co-Investigator(Kenkyū-buntansha) |
MINEMATSU Nobuaki 東京大学, 大学院工学系研究科, 教授 (90273333)
SAITO Daisuke 東京大学, 大学院工学系研究科, 助教 (40615150)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Project Status |
Completed (Fiscal Year 2014)
|
Budget Amount *help |
¥17,810,000 (Direct Cost: ¥13,700,000、Indirect Cost: ¥4,110,000)
Fiscal Year 2014: ¥5,460,000 (Direct Cost: ¥4,200,000、Indirect Cost: ¥1,260,000)
Fiscal Year 2013: ¥5,590,000 (Direct Cost: ¥4,300,000、Indirect Cost: ¥1,290,000)
Fiscal Year 2012: ¥6,760,000 (Direct Cost: ¥5,200,000、Indirect Cost: ¥1,560,000)
|
Keywords | 基本周波数パターン / 生成過程モデル / 統計的音声合成 / 韻律制御 / 音声変換 / 談話の焦点 / マルチストリーム学習 / 行列変量GMM / HMM音声合成 / Deep Neural Network / マルチストリーム / 統計モデリング / 声質変換 / 焦点制御 / 中国語音声 / 声調核モデル |
Outline of Final Research Achievements |
Research works were conducted with the aim of realizing flexible control of prosody and better speech quality in statistical-based speech synthesis by applying constraints of the generation process model of fundamental frequency (F0) contours. Several methods were developed including one to use F0 contours approximated by the model for HMM training. In the method, hierarchical F0 contours based on the model were treated separately by the multi-stream scheme, leading to a better prosody control keeping clear relations with linguistic information. Lexical emphasis was realized by manipulating the model commands (prosody conversion). Better speaker conversions were realized in multi-speaker case through matrix-variate Gaussian mixture model and deep neural network with speaker-dependent sub-networks. Research works were conducted also for Chinese, with preliminary experiments on speech translation.
|
Report
(4 results)
Research Products
(37 results)