2019 Fiscal Year Annual Research Report

Articulatory text-to-speech synthesis based on digital waveguide mesh driven by deep neural network

Research Project

Project/Area Number	17K20004
Research Institution	Nagoya Institute of Technology
Principal Investigator	徳田恵一名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)
Co-Investigator(Kenkyū-buntansha)	南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)
Project Period (FY)	2017-06-30 – 2020-03-31
Keywords	音声合成 / 音声情報処理
Outline of Annual Research Achievements	本研究の目的は、あらゆる声質を柔軟に表現可能な音声合成システムを構築するため、実際の人間の発声機構に則した調音モデルをテキスト音声合成システムに組み込み、その有用性を検証することにある。まず始めに、ディープニューラルネットワークの枠組みの中で2次元ディジタル・ウェーブガイド・メッシュ調音モデルを定式化し、調音モデルをテキスト音声合成システムに組み込むアイディアを数式として表現した。さらに、導出した数式をもとに調音モデルを組み込んだテキスト音声合成システムを構築することに成功した。但し、音声波形から調音モデルを逆推定するための実現可能性の検証を重視していたため、比較的単純なモデル構造を仮定していた。このため調音モデルを組み込んだテキスト音声合成システムから生成される音声の品質には一定の限界があることがわかった。そこで、WaveNetを始めとする最新の音声波形生成手法との融合を目指し、より自然な音声の生成に取り組んだ。これらの音声波形生成手法において、音声波形モデルは調音モデルの構造を部分的に含んでいると考えられる。このような観点から、調音モデルと音声波形モデルの関係性の調査と調音モデルと音声波形モデルの融合を目指し、合成音声の声質や感情の制御の検討を行った。ディープニューラルネットワークに基づく音響モデルの入力に話者コードやフレーズコードなどを加えることで、声質や感情を制御可能な音声合成システムを実現することができた。さらに敵対的学習などの学習手法を適用することで、より高品質な合成音声を生成可能とした。また、発話スタイル等を表す潜在変数の階層化などにより、モデル化精度の向上を図った。

Research Products
(13 results)

All 2020 2019

All Journal Article (2 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results) Presentation (11 results) (of which Int'l Joint Research: 5 results, Invited: 2 results)

[Journal Article] A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis2019
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi, Simon King, Keiichi Tokuda
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 28 Pages: 157-170
- DOI
  10.1109/TASLP.2019.2950099
- Peer Reviewed / Int'l Joint Research
[Journal Article] Neural source-filter waveform models for statistical parametric speech synthesis2019
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 28 Pages: 402-415
- DOI
  10.1109/TASLP.2019.2956145
- Peer Reviewed
[Presentation] End-to-End音声合成のための階層化生成モデルに基づく半教師あり学習2020
- Author(s)
  藤本崇人, 高木信二, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2020年春季研究発表会
[Presentation] 楽譜時間情報を用いたアテンション機構に基づく歌声合成の検討2019
- Author(s)
  村田舜馬, 藤本崇人, 法野行哉, 高木信二, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年秋季研究発表会
[Presentation] Singing voice synthesis based on generative adversarial networks2019
- Author(s)
  Yukiya Hono, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] 周期・非周期信号を用いたDNNに基づくリアルタイム音声ボコーダ2019
- Author(s)
  大浦圭一郎, 中村和寛, 橋本佳, 南角吉彦, 徳田恵一
- Organizer
  情報処理学会研究報告
[Presentation] 周期・非周期信号を用いた敵対的生成ネットワークに基づくリアルタイム音声ボコーダ2019
- Author(s)
  大浦圭一郎, 高木信二, 中村和寛, 橋本佳, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年秋季研究発表会
[Presentation] Statistical approach to speech synthesis: past, present and future2019
- Author(s)
  Keiichi Tokuda
- Organizer
  Interspeech 2019
- Int'l Joint Research / Invited
[Presentation] Deep neural network based real-time speech vocoder with periodic and aperiodic inputs2019
- Author(s)
  Keiichiro Oura, Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  10th ISCA Speech Synthesis Workshop (SSW10)
- Int'l Joint Research
[Presentation] Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis2019
- Author(s)
  Takato Fujimoto, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  10th ISCA Speech Synthesis Workshop (SSW10)
- Int'l Joint Research
[Presentation] Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures2019
- Author(s)
  Motoki Shimada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  10th ISCA Speech Synthesis Workshop (SSW10)
- Int'l Joint Research
[Presentation] 統計的音声合成の進展と展望2019
- Author(s)
  徳田恵一
- Organizer
  音声研究会
- Invited
[Presentation] 歌声合成におけるニューラルボコーダの比較検討2019
- Author(s)
  和田蒼汰, 法野行哉, 高木信二, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会

2019 Fiscal Year Annual Research Report

Articulatory text-to-speech synthesis based on digital waveguide mesh driven by deep neural network

Principal Investigator

徳田 恵一 名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)

Research Products

[Journal Article] A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis2019

Author(s)

Journal Title

DOI

[Journal Article] Neural source-filter waveform models for statistical parametric speech synthesis2019

Author(s)

Journal Title

DOI

[Presentation] End-to-End音声合成のための階層化生成モデルに基づく半教師あり学習2020

Author(s)

Organizer

[Presentation] 楽譜時間情報を用いたアテンション機構に基づく歌声合成の検討2019

Author(s)

Organizer

[Presentation] Singing voice synthesis based on generative adversarial networks2019

Author(s)

Organizer

[Presentation] 周期・非周期信号を用いたDNNに基づくリアルタイム音声ボコーダ2019

Author(s)

Organizer

[Presentation] 周期・非周期信号を用いた敵対的生成ネットワークに基づくリアルタイム音声ボコーダ2019

Author(s)

Organizer

[Presentation] Statistical approach to speech synthesis: past, present and future2019

Author(s)

Organizer

[Presentation] Deep neural network based real-time speech vocoder with periodic and aperiodic inputs2019

Author(s)

Organizer

[Presentation] Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis2019

Author(s)

Organizer

[Presentation] Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures2019

Author(s)

Organizer

[Presentation] 統計的音声合成の進展と展望2019

Author(s)

Organizer

[Presentation] 歌声合成におけるニューラルボコーダの比較検討2019

Author(s)

Organizer

徳田恵一名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)