2022 Fiscal Year Final Research Report
Multilingual speech synthesis based on deep learning to reproduce the speaker and emotion of input speech in different languages
Project/Area Number |
20K11862
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | Nagoya Institute of Technology |
Principal Investigator |
HASHIMOTO Kei 名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10635907)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | 音声合成 |
Outline of Final Research Achievements |
To realize multilingual speech synthesis that reproduces the speaker and emotion of input speech in different languages, I have been working on deep neural network (DNN)-based multilingual speech synthesis that can separate speech features that depend on the language, speaker, and emotion of the input speech. I have proposed multilingual speech synthesis based on adversarial learning to separate language and speaker features, and a model structure to separate speaker and emotion. Additionally, I have proposed a speech synthesis model that uses face images as auxiliary features. The proposed method is expected to realize more natural global communication by generating speech that reproduces the characteristics of the speaker in different languages.
|
Free Research Field |
音声情報処理
|
Academic Significance and Societal Importance of the Research Achievements |
本研究では、音声に含まれる話者・言語・感情といった3つの特徴に注目し、入力音声と異なる言語において入力音声の声質や感情を再現する多言語音声合成技術に確立に取り組んだ。本研究の成果は、音声翻訳システムに応用することで、自分の話すことができない言語においても、自分の声のまま、感情表現を含む自然なコミュニケーションを実現することが期待される。
|