High-quality speech synthesis based on automatically-retrieved speech constraints

Research Project

Project/Area Number	16H06681
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Single-year Grants
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	Takamichi Shinnosuke 東京大学, 大学院情報理工学系研究科, 助教 (90784330)
Project Period (FY)	2016-08-26 – 2018-03-31
Project Status	Completed (Fiscal Year 2017)
Budget Amount *help	¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000) Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2016: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	音声合成 / アンチ・スプーフィング / 深層学習 / 話者認証 / 音声なりすまし / anti-spoofing / 音声処理 / 音声変換 / 機械学習
Outline of Final Research Achievements	A method for speech synthesis incorporating generative adversarial networks (GANs) is proposed. One of the issues causing the quality degradation of speech synthesis is an oversmoothing effect often observed in the generated speech parameters. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness and found that 1) the proposed method can generate more natural spectral parameters regardless of its hyperparameter settings. 2) a Wasserstein GAN minimizing the Earth-Mover’s distance works the best in terms of improving the synthetic speech quality, and 3) the method can be extended to the vocoder-free speech synthesis.

Report

(3 results)

2017 Annual Research Report Final Research Report ( PDF )
2016 Annual Research Report

Research Products
(28 results)

All 2018 2017 2016 Other

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 3 results) Presentation (23 results) (of which Int'l Joint Research: 6 results, Invited: 1 results) Remarks (2 results)

[Journal Article] Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks2018
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processin
  
  Volume: 26 Issue: 1 Pages: 84-96
- DOI
  10.1109/taslp.2017.2761547
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Voice Conversion Using Input-to-Output Highway Networks2017
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E100.D Issue: 8 Pages: 1925-1928
- DOI
  10.1587/transinf.2017EDL8034
- NAID
  130005876129
- ISSN
  0916-8532, 1745-1361
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Voice conversion using input-to-output highway networks2017
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: Vol.E100-D
- NAID
  130005876129
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks2018
- Author(s)
  Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
- Organizer
  IEEE ICASSP
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors2018
- Author(s)
  Yuki Saito, Yusuke Ijima, Kyosuke Nishida, Shinnosuke Takamichi
- Organizer
  IEEE ICASSP
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] 多重周波数解像度のSTFTスペクトルを用いた敵対的DNN音声合成2018
- Author(s)
  齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] 高品質声質変換のための特徴量分析再訪2018
- Author(s)
  須田仁志, 小谷岳, 高道慎之介, 齋藤大輔
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] 雑音環境下音声を用いたDNN音声合成のための雑音生成モデルの敵対的学習2018
- Author(s)
  宇根昌和, 齋藤佑樹, 高道慎之介, 北村大地, 宮崎亮一, 猿渡洋
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応2017
- Author(s)
  高道慎之介
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] Moment matching networkを用いた音声パラメータのランダム生成の検討2017
- Author(s)
  高道慎之介
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] コンテキスト事後確率のSequence-to-Sequence学習を用いた音声変換2017
- Author(s)
  三好裕之
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] 敵対的DNN音声合成におけるF0・継続長の生成2017
- Author(s)
  齋藤佑樹
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] Highway networkを用いた差分スペクトル法に基づく敵対的DNN音声変換2017
- Author(s)
  齋藤佑樹
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis2017
- Author(s)
  Yuki Saito
- Organizer
  IEEE ICASSP
- Place of Presentation
  New Orleans, USA
- Year and Date
  2017-03-05
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] DNNテキスト音声合成のための Anti-spoofing に敵対する学習アルゴリズム2017
- Author(s)
  齋藤佑樹
- Organizer
  情報処理学会
- Place of Presentation
  琴平グランドホテル桜の抄（香川県）
- Year and Date
  2017-02-17
- Related Report
  2016 Annual Research Report
[Presentation] Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra2017
- Author(s)
  Shinnosuke Takamichi
- Organizer
  APSIPA ASC
- Related Report
  2017 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities2017
- Author(s)
  Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
- Organizer
  INTERSPEECH
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] Sampling-based speech parameter generation using moment-matching network2017
- Author(s)
  Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
- Organizer
  INTERSPEECH
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] 音素事後確率とd-vectorを用いたVariational Autoencoderによるノンパラレル多対多音声変換2017
- Author(s)
  齋藤佑樹, 井島勇祐, 西田京介, 高道慎之介
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2017 Annual Research Report
[Presentation] 雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習2017
- Author(s)
  宇根昌和, 齋藤佑樹, 高道慎之介, 北村大地, 宮崎亮一, 猿渡洋
- Organizer
  情報処理学会音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] コンテキスト事後確率のSequence-to-Sequence学習を用いた音声変換とDual Learningの評価2017
- Author(s)
  三好裕之, 齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2017 Annual Research Report
[Presentation] "Moment-matching networkに基づく音声合成における音声パラメータのランダム生成2017
- Author(s)
  高道慎之介, 郡山知樹, 猿渡洋
- Organizer
  情報処理学会音楽情報科学研究会
- Related Report
  2017 Annual Research Report
[Presentation] Moment-matching networkに基づく一期一会音声合成における発話間ゆらぎの評価2017
- Author(s)
  高道慎之介, 郡山知樹, 齋藤佑樹, 猿渡洋
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] 敵対的DNN音声合成におけるダイバージェンスの影響の調査2017
- Author(s)
  齋藤佑樹, 高道慎之介, 猿渡洋
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] Anti-spoofingに敵対するDNN音声変換の評価2017
- Author(s)
  齋藤佑樹
- Organizer
  電子情報通信学会2017年春季研究発表会
- Place of Presentation
  東京大学本郷キャンパス（東京都）
- Related Report
  2016 Annual Research Report
[Presentation] DNN 音声合成のための Anti-Spoofing を考慮した学習アルゴリズム2016
- Author(s)
  齋藤佑樹
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県）
- Year and Date
  2016-09-14
- Related Report
  2016 Annual Research Report
[Remarks] Adversarial DNN-Based Text-To-Speech Synthesis
- URL
  http://sython.org/demo/icassp2017advtts/demo.html
- Related Report
  2016 Annual Research Report
[Remarks] Adversarial DNN-Based Voice Conversion
- URL
  http://sython.org/demo/sp201701advvc/demo.html
- Related Report
  2016 Annual Research Report

High-quality speech synthesis based on automatically-retrieved speech constraints

Principal Investigator

Takamichi Shinnosuke 東京大学, 大学院情報理工学系研究科, 助教 (90784330)

¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)

Report

Research Products

[Journal Article] Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Voice Conversion Using Input-to-Output Highway Networks2017

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Voice conversion using input-to-output highway networks2017

Author(s)

Journal Title

NAID

Related Report

[Presentation] Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks2018

Author(s)

Organizer

Related Report

[Presentation] Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors2018

Author(s)

Organizer

Related Report

[Presentation] 多重周波数解像度のSTFTスペクトルを用いた敵対的DNN音声合成2018

Author(s)

Organizer

Related Report

[Presentation] 高品質声質変換のための特徴量分析再訪2018

Author(s)

Organizer

Related Report

[Presentation] 雑音環境下音声を用いたDNN音声合成のための雑音生成モデルの敵対的学習2018

Author(s)

Organizer

Related Report

[Presentation] GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Moment matching networkを用いた音声パラメータのランダム生成の検討2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] コンテキスト事後確率のSequence-to-Sequence学習を用いた音声変換2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 敵対的DNN音声合成におけるF0・継続長の生成2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Highway networkを用いた差分スペクトル法に基づく敵対的DNN音声変換2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] DNNテキスト音声合成のための Anti-spoofing に敵対する学習アルゴリズム2017