Harnessing Latent Variation in DNN-Based Speech Synthesis

Research Project

Project/Area Number	17K12720
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	National Institute of Informatics
Principal Investigator	Henter Gustav 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (30793096)
Project Period (FY)	2017-04-01 – 2018-03-31
Project Status	Discontinued (Fiscal Year 2017)
Budget Amount *help	¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000) Fiscal Year 2018: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2017: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	Speech synthesis / Latent variables / Controllable synthesis / Deep learning / Emotional speech / 音声合成 / ディープラーニング / 潜在変数 / 制御
Outline of Annual Research Achievements	With this grant, I have derived and published theoretical connections between common (heuristic) practical methods for unsupervised learning of controllable speech synthesisers, and latent variables in Bayesian probability, including how common extensions of the practical approach can be given a probabilistic interpretation. Related work (published as well as submitted) explored the optimal supervised methods for annotating the same data, and (separately) considered speech synthesis with multilingual phonetic control. A listening test is currently comparing the aforementioned supervised and unsupervised approaches against variational autoencoders (VAE) and a journal manuscript with the results, and new theoretical connections between VAE and common synthesis heuristics, is in preparation.

Report

(1 results)

2017 Annual Research Report

Research Products
(3 results)

All 2018 2017

All Presentation (3 results) (of which Int'l Joint Research: 2 results)

[Presentation] Cyborg speech: Deep multilingual speech synthesis for generating segmental foreign accent with natural prosody2018
- Author(s)
  Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, Junichi Yamagishi
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Place of Presentation
  Calgary, Alberta, Canada
- Year and Date
  2018-04-15
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Generating segment-level foreign-accented synthetic speech with natural speech prosody2018
- Author(s)
  Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, Junichi Yamagishi
- Organizer
  第120回音声言語情報処理合同研究発表会
- Place of Presentation
  筑波山江戸屋（茨城県・つくば市）
- Year and Date
  2018-02-20
- Related Report
  2017 Annual Research Report
[Presentation] Principles for learning controllable TTS from annotated and latent variation2017
- Author(s)
  Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi
- Organizer
  Annual Conference of the International Speech Communication Association (Interspeech)
- Place of Presentation
  Stockholm, Sweden
- Year and Date
  2017-08-20
- Related Report
  2017 Annual Research Report
- Int'l Joint Research

Harnessing Latent Variation in DNN-Based Speech Synthesis

Principal Investigator

Henter Gustav 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (30793096)

¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)

Report

Research Products

[Presentation] Cyborg speech: Deep multilingual speech synthesis for generating segmental foreign accent with natural prosody2018

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Generating segment-level foreign-accented synthetic speech with natural speech prosody2018

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Principles for learning controllable TTS from annotated and latent variation2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report