Establishment of speech synthesis framework based on Gaussian process regression
Project/Area Number |
15H02724
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perceptual information processing
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
|
Co-Investigator(Kenkyū-buntansha) |
郡山 知樹 東京工業大学, 工学院, 助教 (50749124)
|
Research Collaborator |
MOUNGSRI Decha
NAGAHAMA Daiki
NOSE Takashi
ARIFIANTO Dhany
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Project Status |
Completed (Fiscal Year 2017)
|
Budget Amount *help |
¥13,000,000 (Direct Cost: ¥10,000,000、Indirect Cost: ¥3,000,000)
Fiscal Year 2017: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2016: ¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)
Fiscal Year 2015: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)
|
Keywords | テキスト音声合成 / 統計的パラメトリック音声合成 / 韻律生成 / ガウス過程回帰 / GPR音声合成 / HMM音声合成 / 機械学習 / 深層学習 / 音声情報処理 / 深層ガウス過程 |
Outline of Final Research Achievements |
The purpose of the research is to develop a novel statistical parametric speech synthesis framework based on Gaussian process regression (GPR). We have proposed prosody generation techniques including pitch pattern prediction and phone duration prediction as well as the spectral parameter generation technique based on GPR. We developed a GPR-based speech synthesis system and showed its effectiveness through assessment of synthetic speech quality. Furthermore, we examined the proposed framework for generating expressive speech. We also examined it for generating more natural-sounding prosody in speech synthesis of a tonal language.
|
Report
(4 results)
Research Products
(49 results)