Budget Amount *help |
¥21,710,000 (Direct Cost: ¥16,700,000、Indirect Cost: ¥5,010,000)
Fiscal Year 2019: ¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Fiscal Year 2018: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000)
Fiscal Year 2017: ¥9,620,000 (Direct Cost: ¥7,400,000、Indirect Cost: ¥2,220,000)
|
Outline of Final Research Achievements |
The "digital voice cloning technology" is an application of speaker adaptation in speech synthesis. In this study, we proposed new elemental technologies and constructed a database for speech recorded in inferior environments other than speech synthesis. First of all, a parallel database "DR-VCTK" was constructed in which low quality voice and original high quality voice are paired. We proposed a new neural network structure called "Multi-modal architecture" to enable the digital cloning of voices from voice data without text data. In addition, a new neural network structure incorporating a speaker encoder that uses speech recorded in a poor environment as training data is proposed, and it is shown that unsupervised speaker adaptation can be performed even from low-quality speech recorded in a non-ideal environment.
|