• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2019 Fiscal Year Final Research Report

A Study on Prosody Embedding Based on Gaussain Proceess Latent Variable Model

Research Project

  • PDF
Project/Area Number 17K12711
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionThe University of Tokyo (2019)
Tokyo Institute of Technology (2017-2018)

Principal Investigator

Koriyama Tomoki  東京大学, 大学院情報理工学系研究科, 助教 (50749124)

Project Period (FY) 2017-04-01 – 2020-03-31
Keywords音声情報処理 / 韻律 / ガウス過程 / 機械学習 / 音声合成
Outline of Final Research Achievements

In statistical speech synthesis, labels in speech synthesis must include not only text but also prosodic information. As a method to obtain latent prosodic information such as accent from speech, we proposed speech synthesis using Gaussian process latent variable model. In this study, we first investigate a speech synthesis system based on deep Gaussian processes, which can extract hidden embedding from complicated language features. The speech synthesis can infer unknown prosodic information as a randam variable of probabilistic model. Therefore, we proposed a semi-supervised speech synthesis system, in which labeled and unlabeled speech data is used as a trainind data by estimating latent prosodic features of the unlabeled speech data.

Free Research Field

音声情報処理

Academic Significance and Societal Importance of the Research Achievements

音声合成におけるラベルはテキストだけではなく,テキストに含まれない韻律情報などを含める必要があり,話し言葉やオーディオブックなどの多様な音声合成システムを構築する際には,ラベルを付与に係るコストなどの問題が生じる.また,同じテキストであっても文脈によって読み方が変わることによるテキストからの韻律推定の困難さや,ラベリングを行う人物間でのラベルの不一致が生じる.そこで本研究では機械学習により韻律を低次元の潜在空間で表現する自動化手法を提案し,データベース構築の容易さや,多様な韻律表現による豊かな音声合成の構築への基礎の構築を行った.

URL: 

Published: 2021-02-19  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi