A Study on Prosody Embedding Based on Gaussain Proceess Latent Variable Model

Research Project

Project/Area Number	17K12711
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	The University of Tokyo (2019) Tokyo Institute of Technology (2017-2018)
Principal Investigator	Koriyama Tomoki 東京大学, 大学院情報理工学系研究科, 助教 (50749124)
Project Period (FY)	2017-04-01 – 2020-03-31
Project Status	Completed (Fiscal Year 2019)
Budget Amount *help	¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2018: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000) Fiscal Year 2017: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	音声情報処理 / 韻律 / ガウス過程 / 機械学習 / 音声合成 / 統計的音声合成 / ガウス過程潜在変数モデル / 生成モデル / 深層ガウス過程 / アクセント / 統計音声合成 / 半教師あり学習
Outline of Final Research Achievements	In statistical speech synthesis, labels in speech synthesis must include not only text but also prosodic information. As a method to obtain latent prosodic information such as accent from speech, we proposed speech synthesis using Gaussian process latent variable model. In this study, we first investigate a speech synthesis system based on deep Gaussian processes, which can extract hidden embedding from complicated language features. The speech synthesis can infer unknown prosodic information as a randam variable of probabilistic model. Therefore, we proposed a semi-supervised speech synthesis system, in which labeled and unlabeled speech data is used as a trainind data by estimating latent prosodic features of the unlabeled speech data.
Academic Significance and Societal Importance of the Research Achievements	音声合成におけるラベルはテキストだけではなく，テキストに含まれない韻律情報などを含める必要があり，話し言葉やオーディオブックなどの多様な音声合成システムを構築する際には，ラベルを付与に係るコストなどの問題が生じる．また，同じテキストであっても文脈によって読み方が変わることによるテキストからの韻律推定の困難さや，ラベリングを行う人物間でのラベルの不一致が生じる．そこで本研究では機械学習により韻律を低次元の潜在空間で表現する自動化手法を提案し，データベース構築の容易さや，多様な韻律表現による豊かな音声合成の構築への基礎の構築を行った．

Report

(4 results)

2019 Annual Research Report Final Research Report ( PDF )
2018 Research-status Report
2017 Research-status Report

Research Products
(15 results)

All 2020 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (14 results) (of which Int'l Joint Research: 4 results)

[Journal Article] Statistical Parametric Speech Synthesis Using Deep Gaussian Processes2019
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 27 Issue: 5 Pages: 948-959
- DOI
  10.1109/taslp.2019.2905167
- Related Report
  2019 Annual Research Report 2018 Research-status Report
- Peer Reviewed / Open Access
[Presentation] Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit2020
- Author(s)
  Tomoki Koriyama, Hiroshi Saruwatari
- Organizer
  Proc. 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), (May 2020)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] 深層ガウス過程音声合成における関数の確率微分方程式表現の利用の検討2020
- Author(s)
  郡山知樹, 猿渡洋
- Organizer
  日本音響学会2020年春季研究発表会講演論文集, 2-Q-44, pp.1127-1128. (Mar. 2020)
- Related Report
  2019 Annual Research Report
[Presentation] Attentionに基づく音声変換のためのアラインメント予測モデルの検討2020
- Author(s)
  芹川武尊, 郡山知樹, 猿渡洋
- Organizer
  日本音響学会2020年春季研究発表会講演論文集, 2-2-2, pp.1077-1078. (Mar. 2020)
- Related Report
  2019 Annual Research Report
[Presentation] 深層ガウス過程に基づく多話者音声合成2020
- Author(s)
  三井健太郎, 郡山知樹, 猿渡洋
- Organizer
  日本音響学会2020年春季研究発表会講演論文集, 1-2-2, pp.1043-1044. (Mar. 2020)
- Related Report
  2019 Annual Research Report
[Presentation] 深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討2020
- Author(s)
  三井健太郎, 郡山知樹, 猿渡洋
- Organizer
  電子情報通信学会技術研究報告, vol.119, no.398, SP2019-49, pp.31-36
- Related Report
  2019 Annual Research Report
[Presentation] Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model2019
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Organizer
  Proc. 20th Annual Conference of the International Speech Communication (INTERSPEECH 2019), pp.4450-4454. (Sept. 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes2019
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Organizer
  Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.4785-4789. (May 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] 深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討2019
- Author(s)
  郡山知樹, 猿渡洋
- Organizer
  日本音響学会2019年秋季研究発表会講演論文集, 1-P-25, pp.1025-1026. (Sept. 2019)
- Related Report
  2019 Annual Research Report
[Presentation] A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes2019
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Organizer
  Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] 深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討2019
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  電子情報通信学会技術研究報告
- Related Report
  2018 Research-status Report
[Presentation] 深層ガウス過程に基づく音声合成のための事前学習の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  日本音響学会2018年秋季研究発表会講演論文集
- Related Report
  2018 Research-status Report
[Presentation] GPR音声合成のための深層構造の利用の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  日本音響学会2018年春季研究発表会講演論文集, pp. 1507-1508
- Related Report
  2017 Research-status Report
[Presentation] GPR音声合成における深層ガウス過程の利用の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  電子情報通信学会技術研究報告, Vol. 117, No. 517, pp. 27-32
- Related Report
  2017 Research-status Report
[Presentation] GP-DNNハイブリッドモデルに基づく統計的音声合成の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  電子情報通信学会技術研究報告, Vol. 117, No. 393, pp. 5-10
- Related Report
  2017 Research-status Report

A Study on Prosody Embedding Based on Gaussain Proceess Latent Variable Model

Principal Investigator

Koriyama Tomoki 東京大学, 大学院情報理工学系研究科, 助教 (50749124)

¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)

Report

Research Products

[Journal Article] Statistical Parametric Speech Synthesis Using Deep Gaussian Processes2019

Author(s)

Journal Title

DOI

Related Report

[Presentation] Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit2020

Author(s)

Organizer

Related Report

[Presentation] 深層ガウス過程音声合成における関数の確率微分方程式表現の利用の検討2020

Author(s)

Organizer

Related Report

[Presentation] Attentionに基づく音声変換のためのアラインメント予測モデルの検討2020

Author(s)

Organizer

Related Report

[Presentation] 深層ガウス過程に基づく多話者音声合成2020

Author(s)

Organizer

Related Report

[Presentation] 深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討2020

Author(s)

Organizer

Related Report

[Presentation] Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model2019

Author(s)

Organizer

Related Report

[Presentation] A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes2019

Author(s)

Organizer

Related Report

[Presentation] 深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討2019

Author(s)

Organizer

Related Report

[Presentation] A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes2019

Author(s)

Organizer

Related Report

[Presentation] 深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討2019

Author(s)

Organizer

Related Report

[Presentation] 深層ガウス過程に基づく音声合成のための事前学習の検討2018

Author(s)

Organizer

Related Report

[Presentation] GPR音声合成のための深層構造の利用の検討2018

Author(s)

Organizer

Related Report

[Presentation] GPR音声合成における深層ガウス過程の利用の検討2018

Author(s)

Organizer

Related Report

[Presentation] GP-DNNハイブリッドモデルに基づく統計的音声合成の検討2018

Author(s)

Organizer

Related Report