Articulatory text-to-speech synthesis based on digital waveguide mesh driven by deep neural network
Project/Area Number |
17K20004
|
Research Category |
Grant-in-Aid for Challenging Research (Exploratory)
|
Allocation Type | Multi-year Fund |
Research Field |
Human informatics and related fields
|
Research Institution | Nagoya Institute of Technology |
Principal Investigator |
Tokuda Keiichi 名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)
|
Co-Investigator(Kenkyū-buntansha) |
南角 吉彦 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)
|
Project Period (FY) |
2017-06-30 – 2020-03-31
|
Project Status |
Completed (Fiscal Year 2019)
|
Budget Amount *help |
¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)
Fiscal Year 2019: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Fiscal Year 2018: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
Fiscal Year 2017: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
|
Keywords | 音声合成 / 音声情報処理 / ニューラルネットワーク / 調音モデル |
Outline of Final Research Achievements |
In order to construct a speech synthesis system that can flexibly generate expressive speech, we have developed a deep neural network-based text-to-speech synthesis system that incorporates an articulatory model based on human speech production mechanism into a text speech synthesis system based on a deep neural network. In order to improve the voice quality, we attempted to combine it with WaveNet and other voice waveform generation methods based on deep neural networks. Furthermore, we examined the method of controlling the voice quality and emotional expression based on the generative adversarial training.
|
Academic Significance and Societal Importance of the Research Achievements |
スマートフォン、スマートスピーカー等、高度な情報機器が急速に普及しつつある中で、これらの情報機器と人間との間の情報交換の方法として音声インタフェースに期待がかかっている。これらの機械と自然な会話を行うためには、出力される合成音声は自在にあらゆる声質の音声を出力し、また、様々な感情表現を行うことが必須である。本研究はこのような人間のようにしゃべる機械の実現に貢献するものである。
|
Report
(4 results)
Research Products
(43 results)