Robust voice cloning technologies in noisy environments and its applications
Project/Area Number |
17H04687
|
Research Category |
Grant-in-Aid for Young Scientists (A)
|
Allocation Type | Single-year Grants |
Research Field |
Perceptual information processing
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)
|
Project Period (FY) |
2017-04-01 – 2020-03-31
|
Project Status |
Completed (Fiscal Year 2019)
|
Budget Amount *help |
¥21,710,000 (Direct Cost: ¥16,700,000、Indirect Cost: ¥5,010,000)
Fiscal Year 2019: ¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)
Fiscal Year 2018: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000)
Fiscal Year 2017: ¥9,620,000 (Direct Cost: ¥7,400,000、Indirect Cost: ¥2,220,000)
|
Keywords | 音声情報処理 / 音声合成 / 深層学習 / 話者適応 / 音声強調 / デジタルクローン / ディープラーニング |
Outline of Final Research Achievements |
The "digital voice cloning technology" is an application of speaker adaptation in speech synthesis. In this study, we proposed new elemental technologies and constructed a database for speech recorded in inferior environments other than speech synthesis. First of all, a parallel database "DR-VCTK" was constructed in which low quality voice and original high quality voice are paired. We proposed a new neural network structure called "Multi-modal architecture" to enable the digital cloning of voices from voice data without text data. In addition, a new neural network structure incorporating a speaker encoder that uses speech recorded in a poor environment as training data is proposed, and it is shown that unsupervised speaker adaptation can be performed even from low-quality speech recorded in a non-ideal environment.
|
Academic Significance and Societal Importance of the Research Achievements |
音声合成用音響モデルの学習は、通常、スタジオ収録した高品質音声のみを対象にする。それゆえ、雑音・反響音を含む音声もしくは低品質収録器材により収録された音声に基づき音声合成を行うことは容易ではなく、研究理論に至っては全く構築されていないと言って良い状況であった。本研究は既存技術の制約を取り払い、劣悪条件や正解ラベルがないと言った環境においても、声のデジタルクローンを可能にするした。それゆえ、音声合成および話者適応技術を理論的により熟成させたという学術的意義を持つ。また、音声合成および話者適応技術の応用先が爆発的に増えると予想され、社会的意義も大きい。
|
Report
(4 results)
Research Products
(76 results)