• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2014 Fiscal Year Final Research Report

Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours

Research Project

  • PDF
Project/Area Number 24300068
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypePartial Multi-year Fund
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionThe University of Tokyo

Principal Investigator

HIROSE Keikichi  東京大学, 情報理工学(系)研究科, 教授 (50111472)

Co-Investigator(Kenkyū-buntansha) MINEMATSU Nobuaki  東京大学, 大学院工学系研究科, 教授 (90273333)
SAITO Daisuke  東京大学, 大学院工学系研究科, 助教 (40615150)
Project Period (FY) 2012-04-01 – 2015-03-31
Keywords基本周波数パターン / 生成過程モデル / 統計的音声合成 / 韻律制御 / 音声変換 / 談話の焦点 / マルチストリーム学習 / 行列変量GMM
Outline of Final Research Achievements

Research works were conducted with the aim of realizing flexible control of prosody and better speech quality in statistical-based speech synthesis by applying constraints of the generation process model of fundamental frequency (F0) contours. Several methods were developed including one to use F0 contours approximated by the model for HMM training. In the method, hierarchical F0 contours based on the model were treated separately by the multi-stream scheme, leading to a better prosody control keeping clear relations with linguistic information. Lexical emphasis was realized by manipulating the model commands (prosody conversion). Better speaker conversions were realized in multi-speaker case through matrix-variate Gaussian mixture model and deep neural network with speaker-dependent sub-networks. Research works were conducted also for Chinese, with preliminary experiments on speech translation.

Free Research Field

音声言語情報処理

URL: 

Published: 2016-06-03  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi