2017 Fiscal Year Final Research Report

High-quality speech synthesis based on automatically-retrieved speech constraints

Research Project

PDF

Project/Area Number	16H06681
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Single-year Grants
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	Takamichi Shinnosuke 東京大学, 大学院情報理工学系研究科, 助教 (90784330)
Project Period (FY)	2016-08-26 – 2018-03-31
Keywords	音声合成 / アンチ・スプーフィング / 深層学習 / 話者認証
Outline of Final Research Achievements	A method for speech synthesis incorporating generative adversarial networks (GANs) is proposed. One of the issues causing the quality degradation of speech synthesis is an oversmoothing effect often observed in the generated speech parameters. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness and found that 1) the proposed method can generate more natural spectral parameters regardless of its hyperparameter settings. 2) a Wasserstein GAN minimizing the Earth-Mover’s distance works the best in terms of improving the synthetic speech quality, and 3) the method can be extended to the vocoder-free speech synthesis.
Free Research Field	音声合成