High-quality speech synthesis based on automatically-retrieved speech constraints
Project/Area Number |
16H06681
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Single-year Grants |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
|
Project Period (FY) |
2016-08-26 – 2018-03-31
|
Project Status |
Completed (Fiscal Year 2017)
|
Budget Amount *help |
¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2016: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
|
Keywords | 音声合成 / アンチ・スプーフィング / 深層学習 / 話者認証 / 音声なりすまし / anti-spoofing / 音声処理 / 音声変換 / 機械学習 |
Outline of Final Research Achievements |
A method for speech synthesis incorporating generative adversarial networks (GANs) is proposed. One of the issues causing the quality degradation of speech synthesis is an oversmoothing effect often observed in the generated speech parameters. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness and found that 1) the proposed method can generate more natural spectral parameters regardless of its hyperparameter settings. 2) a Wasserstein GAN minimizing the Earth-Mover’s distance works the best in terms of improving the synthetic speech quality, and 3) the method can be extended to the vocoder-free speech synthesis.
|
Report
(3 results)
Research Products
(28 results)