2017 Fiscal Year Final Research Report
High-quality speech synthesis based on automatically-retrieved speech constraints
Project/Area Number |
16H06681
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Single-year Grants |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
|
Project Period (FY) |
2016-08-26 – 2018-03-31
|
Keywords | 音声合成 / アンチ・スプーフィング / 深層学習 / 話者認証 |
Outline of Final Research Achievements |
A method for speech synthesis incorporating generative adversarial networks (GANs) is proposed. One of the issues causing the quality degradation of speech synthesis is an oversmoothing effect often observed in the generated speech parameters. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness and found that 1) the proposed method can generate more natural spectral parameters regardless of its hyperparameter settings. 2) a Wasserstein GAN minimizing the Earth-Mover’s distance works the best in terms of improving the synthetic speech quality, and 3) the method can be extended to the vocoder-free speech synthesis.
|
Free Research Field |
音声合成
|