2017 Fiscal Year Final Research Report

Establishment of speech synthesis framework based on Gaussian process regression

Research Project

Project/Area Number	15H02724
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perceptual information processing
Research Institution	Tokyo Institute of Technology
Principal Investigator	Kobayashi Takao 東京工業大学, 工学院, 教授 (70153616)
Co-Investigator(Kenkyū-buntansha)	郡山知樹東京工業大学, 工学院, 助教 (50749124)
Research Collaborator	MOUNGSRI Decha NAGAHAMA Daiki NOSE Takashi ARIFIANTO Dhany
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	テキスト音声合成 / 統計的パラメトリック音声合成 / 韻律生成 / ガウス過程回帰 / GPR音声合成 / HMM音声合成 / 機械学習 / 深層学習
Outline of Final Research Achievements	The purpose of the research is to develop a novel statistical parametric speech synthesis framework based on Gaussian process regression (GPR). We have proposed prosody generation techniques including pitch pattern prediction and phone duration prediction as well as the spectral parameter generation technique based on GPR. We developed a GPR-based speech synthesis system and showed its effectiveness through assessment of synthetic speech quality. Furthermore, we examined the proposed framework for generating expressive speech. We also examined it for generating more natural-sounding prosody in speech synthesis of a tonal language.
Free Research Field	音声情報処理