• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

A Study on Prosody Embedding Based on Gaussain Proceess Latent Variable Model

Research Project

Project/Area Number 17K12711
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionThe University of Tokyo (2019)
Tokyo Institute of Technology (2017-2018)

Principal Investigator

Koriyama Tomoki  東京大学, 大学院情報理工学系研究科, 助教 (50749124)

Project Period (FY) 2017-04-01 – 2020-03-31
Project Status Completed (Fiscal Year 2019)
Budget Amount *help
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2018: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2017: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords音声情報処理 / 韻律 / ガウス過程 / 機械学習 / 音声合成 / 統計的音声合成 / ガウス過程潜在変数モデル / 生成モデル / 深層ガウス過程 / アクセント / 統計音声合成 / 半教師あり学習
Outline of Final Research Achievements

In statistical speech synthesis, labels in speech synthesis must include not only text but also prosodic information. As a method to obtain latent prosodic information such as accent from speech, we proposed speech synthesis using Gaussian process latent variable model. In this study, we first investigate a speech synthesis system based on deep Gaussian processes, which can extract hidden embedding from complicated language features. The speech synthesis can infer unknown prosodic information as a randam variable of probabilistic model. Therefore, we proposed a semi-supervised speech synthesis system, in which labeled and unlabeled speech data is used as a trainind data by estimating latent prosodic features of the unlabeled speech data.

Academic Significance and Societal Importance of the Research Achievements

音声合成におけるラベルはテキストだけではなく,テキストに含まれない韻律情報などを含める必要があり,話し言葉やオーディオブックなどの多様な音声合成システムを構築する際には,ラベルを付与に係るコストなどの問題が生じる.また,同じテキストであっても文脈によって読み方が変わることによるテキストからの韻律推定の困難さや,ラベリングを行う人物間でのラベルの不一致が生じる.そこで本研究では機械学習により韻律を低次元の潜在空間で表現する自動化手法を提案し,データベース構築の容易さや,多様な韻律表現による豊かな音声合成の構築への基礎の構築を行った.

Report

(4 results)
  • 2019 Annual Research Report   Final Research Report ( PDF )
  • 2018 Research-status Report
  • 2017 Research-status Report
  • Research Products

    (15 results)

All 2020 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results,  Open Access: 1 results) Presentation (14 results) (of which Int'l Joint Research: 4 results)

  • [Journal Article] Statistical Parametric Speech Synthesis Using Deep Gaussian Processes2019

    • Author(s)
      Tomoki Koriyama, Takao Kobayashi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 27 Issue: 5 Pages: 948-959

    • DOI

      10.1109/taslp.2019.2905167

    • Related Report
      2019 Annual Research Report 2018 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit2020

    • Author(s)
      Tomoki Koriyama, Hiroshi Saruwatari
    • Organizer
      Proc. 45th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), (May 2020)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 深層ガウス過程音声合成における関数の確率微分方程式表現の利用の検討2020

    • Author(s)
      郡山知樹, 猿渡洋
    • Organizer
      日本音響学会2020年春季研究発表会講演論文集, 2-Q-44, pp.1127-1128. (Mar. 2020)
    • Related Report
      2019 Annual Research Report
  • [Presentation] Attentionに基づく音声変換のためのアラインメント予測モデルの検討2020

    • Author(s)
      芹川武尊, 郡山知樹, 猿渡洋
    • Organizer
      日本音響学会2020年春季研究発表会講演論文集, 2-2-2, pp.1077-1078. (Mar. 2020)
    • Related Report
      2019 Annual Research Report
  • [Presentation] 深層ガウス過程に基づく多話者音声合成2020

    • Author(s)
      三井健太郎, 郡山知樹, 猿渡洋
    • Organizer
      日本音響学会2020年春季研究発表会講演論文集, 1-2-2, pp.1043-1044. (Mar. 2020)
    • Related Report
      2019 Annual Research Report
  • [Presentation] 深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討2020

    • Author(s)
      三井健太郎, 郡山知樹, 猿渡洋
    • Organizer
      電子情報通信学会技術研究報告, vol.119, no.398, SP2019-49, pp.31-36
    • Related Report
      2019 Annual Research Report
  • [Presentation] Semi-Supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model2019

    • Author(s)
      Tomoki Koriyama, Takao Kobayashi
    • Organizer
      Proc. 20th Annual Conference of the International Speech Communication (INTERSPEECH 2019), pp.4450-4454. (Sept. 2019)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes2019

    • Author(s)
      Tomoki Koriyama, Takao Kobayashi
    • Organizer
      Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), pp.4785-4789. (May 2019)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討2019

    • Author(s)
      郡山知樹, 猿渡洋
    • Organizer
      日本音響学会2019年秋季研究発表会講演論文集, 1-P-25, pp.1025-1026. (Sept. 2019)
    • Related Report
      2019 Annual Research Report
  • [Presentation] A Training Method Using DNN-guided Layerwise Pretraining For Deep Gaussian Processes2019

    • Author(s)
      Tomoki Koriyama, Takao Kobayashi
    • Organizer
      Proc. 44th IEEE International Conference on Acoustics, Speech and Signal Processing
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] 深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討2019

    • Author(s)
      郡山知樹, 小林隆夫
    • Organizer
      電子情報通信学会技術研究報告
    • Related Report
      2018 Research-status Report
  • [Presentation] 深層ガウス過程に基づく音声合成のための事前学習の検討2018

    • Author(s)
      郡山知樹, 小林隆夫
    • Organizer
      日本音響学会2018年秋季研究発表会講演論文集
    • Related Report
      2018 Research-status Report
  • [Presentation] GPR音声合成のための深層構造の利用の検討2018

    • Author(s)
      郡山知樹, 小林隆夫
    • Organizer
      日本音響学会2018年春季研究発表会講演論文集, pp. 1507-1508
    • Related Report
      2017 Research-status Report
  • [Presentation] GPR音声合成における深層ガウス過程の利用の検討2018

    • Author(s)
      郡山知樹, 小林隆夫
    • Organizer
      電子情報通信学会技術研究報告, Vol. 117, No. 517, pp. 27-32
    • Related Report
      2017 Research-status Report
  • [Presentation] GP-DNNハイブリッドモデルに基づく統計的音声合成の検討2018

    • Author(s)
      郡山知樹, 小林隆夫
    • Organizer
      電子情報通信学会技術研究報告, Vol. 117, No. 393, pp. 5-10
    • Related Report
      2017 Research-status Report

URL: 

Published: 2017-04-28   Modified: 2021-02-19  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi