• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2017 Fiscal Year Final Research Report

High-quality speech synthesis based on automatically-retrieved speech constraints

Research Project

  • PDF
Project/Area Number 16H06681
Research Category

Grant-in-Aid for Research Activity Start-up

Allocation TypeSingle-year Grants
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

Takamichi Shinnosuke  東京大学, 大学院情報理工学系研究科, 助教 (90784330)

Project Period (FY) 2016-08-26 – 2018-03-31
Keywords音声合成 / アンチ・スプーフィング / 深層学習 / 話者認証
Outline of Final Research Achievements

A method for speech synthesis incorporating generative adversarial networks (GANs) is proposed. One of the issues causing the quality degradation of speech synthesis is an oversmoothing effect often observed in the generated speech parameters. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness and found that 1) the proposed method can generate more natural spectral parameters regardless of its hyperparameter settings. 2) a Wasserstein GAN minimizing the Earth-Mover’s distance works the best in terms of improving the synthetic speech quality, and 3) the method can be extended to the vocoder-free speech synthesis.

Free Research Field

音声合成

URL: 

Published: 2019-03-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi