Articulatory text-to-speech synthesis based on digital waveguide mesh driven by deep neural network

Research Project

Project/Area Number	17K20004
Research Category	Grant-in-Aid for Challenging Research (Exploratory)
Allocation Type	Multi-year Fund
Research Field	Human informatics and related fields
Research Institution	Nagoya Institute of Technology
Principal Investigator	Tokuda Keiichi 名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)
Co-Investigator(Kenkyū-buntansha)	南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)
Project Period (FY)	2017-06-30 – 2020-03-31
Project Status	Completed (Fiscal Year 2019)
Budget Amount *help	¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000) Fiscal Year 2019: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000) Fiscal Year 2018: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000) Fiscal Year 2017: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
Keywords	音声合成 / 音声情報処理 / ニューラルネットワーク / 調音モデル
Outline of Final Research Achievements	In order to construct a speech synthesis system that can flexibly generate expressive speech, we have developed a deep neural network-based text-to-speech synthesis system that incorporates an articulatory model based on human speech production mechanism into a text speech synthesis system based on a deep neural network. In order to improve the voice quality, we attempted to combine it with WaveNet and other voice waveform generation methods based on deep neural networks. Furthermore, we examined the method of controlling the voice quality and emotional expression based on the generative adversarial training.
Academic Significance and Societal Importance of the Research Achievements	スマートフォン、スマートスピーカー等、高度な情報機器が急速に普及しつつある中で、これらの情報機器と人間との間の情報交換の方法として音声インタフェースに期待がかかっている。これらの機械と自然な会話を行うためには、出力される合成音声は自在にあらゆる声質の音声を出力し、また、様々な感情表現を行うことが必須である。本研究はこのような人間のようにしゃべる機械の実現に貢献するものである。

Report

(4 results)

2019 Annual Research Report Final Research Report ( PDF )
2018 Research-status Report
2017 Research-status Report

Research Products
(43 results)

All 2020 2019 2018 2017 Other

All Int'l Joint Research (1 results) Journal Article (3 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 3 results, Open Access: 1 results) Presentation (39 results) (of which Int'l Joint Research: 15 results, Invited: 2 results)

[Int'l Joint Research] University of York(英国)
- Related Report
  2017 Research-status Report
[Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020
- Author(s)
  Wang Xin、Takaki Shinji、Yamagishi Junichi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 28 Pages: 402-415
- DOI
  10.1109/taslp.2019.2956145
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] A vector quantized variational autoencoder (VQ-VAE) autoregressive neural F0 model for statistical parametric speech synthesis2019
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi, Simon King, and Keiichi Tokuda
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 28 Pages: 157-170
- DOI
  10.1109/taslp.2019.2950099
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Mel-cepstrum-based quantization noise shaping applied to neural-network-based speech waveform synthesis2018
- Author(s)
  Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 26 Issue: 7 Pages: 1173-1180
- DOI
  10.1109/taslp.2018.2818408
- Related Report
  2018 Research-status Report
- Peer Reviewed
[Presentation] End-to-End音声合成のための階層化生成モデルに基づく半教師あり学習2020
- Author(s)
  藤本崇人, 高木信二, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] 楽譜時間情報を用いたアテンション機構に基づく歌声合成の検討2019
- Author(s)
  村田舜馬, 藤本崇人, 法野行哉, 高木信二, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] Singing voice synthesis based on generative adversarial networks2019
- Author(s)
  Yukiya Hono, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] 周期・非周期信号を用いたDNNに基づくリアルタイム音声ボコーダ2019
- Author(s)
  大浦圭一郎, 中村和寛, 橋本佳, 南角吉彦, 徳田恵一
- Organizer
  情報処理学会研究報告
- Related Report
  2019 Annual Research Report
[Presentation] 周期・非周期信号を用いた敵対的生成ネットワークに基づくリアルタイム音声ボコーダ2019
- Author(s)
  大浦圭一郎, 高木信二, 中村和寛, 橋本佳, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] Statistical approach to speech synthesis: past, present and future2019
- Author(s)
  Keiichi Tokuda
- Organizer
  Interspeech 2019
- Related Report
  2019 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Deep neural network based real-time speech vocoder with periodic and aperiodic inputs2019
- Author(s)
  Keiichiro Oura, Kazuhiro Nakamura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  10th ISCA Speech Synthesis Workshop (SSW10)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis2019
- Author(s)
  Takato Fujimoto, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  10th ISCA Speech Synthesis Workshop (SSW10)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures2019
- Author(s)
  Motoki Shimada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda
- Organizer
  10th ISCA Speech Synthesis Workshop (SSW10)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] 統計的音声合成の進展と展望2019
- Author(s)
  徳田恵一
- Organizer
  音声研究会
- Related Report
  2019 Annual Research Report
- Invited
[Presentation] 歌声合成におけるニューラルボコーダの比較検討2019
- Author(s)
  和田蒼汰, 法野行哉, 高木信二, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会
- Related Report
  2019 Annual Research Report
[Presentation] 隠れセミマルコフモデルの構造を用いたDNNに基づく音声合成における計算量削減手法の検討2019
- Author(s)
  島田基樹, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] 日本語End-to-End音声合成における入力言語特徴量の影響2019
- Author(s)
  藤本崇人, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] 周期・非周期信号から駆動するディープニューラルネットワークに基づく音声ボコーダ2019
- Author(s)
  大浦圭一郎, 中村和寛, 橋本佳, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] 敵対的生成ネットワークを用いた歌声合成の検討2019
- Author(s)
  法野行哉, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] DNNに基づく感情音声合成のための敵対的学習の検討2019
- Author(s)
  角谷健太, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] Singing voice synthesis based on generative adversarial networks2019
- Author(s)
  Yukiya Hono, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Singing Voice Conversion Using Posted Waveform Data on Music Social Media2018
- Author(s)
  Koki Senda, Yukiya Hono, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Recent Development of the DNN-based Singing Voice Synthesis System -- Sinsy2018
- Author(s)
  Yukiya Hono, Shumma Murata, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition2018
- Author(s)
  Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Speaker Adaptation for Speech Synthesis Based on Deep Neural Networks Using Hidden Semi-Markov Model Structures2018
- Author(s)
  Kento Nakao, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] The NITech text-to-speech system for the Blizzard Challenge 20182018
- Author(s)
  Kei Sawada, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Blizzard Challenge 2018 Workshop
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Statistical voice conversion based on WaveNet2018
- Author(s)
  Jumpei Niwa, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] 周期・非周期成分の分離に基づくWaveNetボコーダを用いた音声合成2018
- Author(s)
  藤本崇人, 吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] Deep Neural Networkに基づく歌声合成システム -- Sinsy2018
- Author(s)
  法野行哉, 村田舜馬, 中村和寛, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] Blizzard Challenge 2018のためのNITechテキスト音声合成システム2018
- Author(s)
  沢田慶, 吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
- Related Report
  2018 Research-status Report
[Presentation] 時間構造を考慮したニューラルネットワークに基づく音声合成における話者適応の検討2018
- Author(s)
  中尾健人, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会
- Related Report
  2018 Research-status Report
[Presentation] DNN音声合成のためのパワーを考慮したトラジェクトリ学習2018
- Author(s)
  船戸涼平, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会
- Related Report
  2017 Research-status Report
[Presentation] メルケプストラムに基づくノイズシェーピング量子化法のWaveNet音声合成への適用2018
- Author(s)
  吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会
- Related Report
  2017 Research-status Report
[Presentation] WaveNetに基づく声質変換の検討2018
- Author(s)
  丹羽純平, 吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会
- Related Report
  2017 Research-status Report
[Presentation] Blizzard Machine Learning Challenge 2017の概要2018
- Author(s)
  沢田慶, 徳田恵一, Simon King, Alan W Black
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] 隠れセミマルコフモデルの構造を利用したニューラルネットワークに基づく歌声合成2018
- Author(s)
  法野行哉, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] DNNに基づく発声タイミングモデルを利用した歌声合成2018
- Author(s)
  村田舜馬, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] WaveNetにおけるメルケプストラムに基づくノイズシェーピング量子化法の適用2017
- Author(s)
  吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] WaveNet-based voice conversion2017
- Author(s)
  丹羽純平, 吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] Blizzard Challenge 2017のためのNITechテキスト音声合成システム2017
- Author(s)
  沢田慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Research-status Report
[Presentation] Articulatory text-to-speech synthesis using the digital waveguide mesh driven by a deep neural network2017
- Author(s)
  Amelia J. Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  INTERSPEECH 2017
- Related Report
  2017 Research-status Report
- Int'l Joint Research
[Presentation] The NITech text-to-speech system for the Blizzard Challenge 20172017
- Author(s)
  Kei Sawada, Kei Hashimoto, Keiichiro Oura, and Keiichi Tokuda
- Organizer
  Blizzard Challenge 2017 Workshop
- Related Report
  2017 Research-status Report
- Int'l Joint Research
[Presentation] The Blizzard Machine Learning Challenge 20172017
- Author(s)
  Kei Sawada, Keiichi Tokuda, Simon King, and Alan W Black
- Organizer
  2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- Related Report
  2017 Research-status Report
- Int'l Joint Research

Articulatory text-to-speech synthesis based on digital waveguide mesh driven by deep neural network

Principal Investigator

Tokuda Keiichi 名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)

¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)

Report

Research Products

[Int'l Joint Research] University of York(英国)

Related Report

[Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A vector quantized variational autoencoder (VQ-VAE) autoregressive neural F0 model for statistical parametric speech synthesis2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Mel-cepstrum-based quantization noise shaping applied to neural-network-based speech waveform synthesis2018

Author(s)

Journal Title

DOI

Related Report

[Presentation] End-to-End音声合成のための階層化生成モデルに基づく半教師あり学習2020

Author(s)

Organizer

Related Report

[Presentation] 楽譜時間情報を用いたアテンション機構に基づく歌声合成の検討2019

Author(s)

Organizer

Related Report

[Presentation] Singing voice synthesis based on generative adversarial networks2019

Author(s)

Organizer

Related Report

[Presentation] 周期・非周期信号を用いたDNNに基づくリアルタイム音声ボコーダ2019

Author(s)

Organizer

Related Report

[Presentation] 周期・非周期信号を用いた敵対的生成ネットワークに基づくリアルタイム音声ボコーダ2019

Author(s)

Organizer

Related Report

[Presentation] Statistical approach to speech synthesis: past, present and future2019

Author(s)

Organizer

Related Report

[Presentation] Deep neural network based real-time speech vocoder with periodic and aperiodic inputs2019

Author(s)

Organizer

Related Report

[Presentation] Impacts of input linguistic feature representation on Japanese end-to-end speech synthesis2019

Author(s)

Organizer

Related Report

[Presentation] Low computational cost speech synthesis based on deep neural networks using hidden semi-Markov model structures2019

Author(s)

Organizer

Related Report

[Presentation] 統計的音声合成の進展と展望2019

Author(s)

Organizer

Related Report

[Presentation] 歌声合成におけるニューラルボコーダの比較検討2019

Author(s)

Organizer

Related Report

[Presentation] 隠れセミマルコフモデルの構造を用いたDNNに基づく音声合成における計算量削減手法の検討2019

Author(s)

Organizer

Related Report

[Presentation] 日本語End-to-End音声合成における入力言語特徴量の影響2019

Author(s)

Organizer

Related Report

[Presentation] 周期・非周期信号から駆動するディープニューラルネットワークに基づく音声ボコーダ2019

Author(s)

Organizer

Related Report

[Presentation] 敵対的生成ネットワークを用いた歌声合成の検討2019