Can we reduce misperceptions of emotional content of speech in the noisy environments?
Project/Area Number |
19K24373
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Multi-year Fund |
Review Section |
1002:Human informatics, applied informatics and related fields
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Zhao Yi 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (10843162)
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Project Status |
Completed (Fiscal Year 2020)
|
Budget Amount *help |
¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | VQVAE / emotional enhancement / neural networks / voice conversion / Lombard speech / Adversarial network / emotion enhancement / speaker embedding / neural vocoder / F0 encoder / speech perception / Lombard effect / deep learning |
Outline of Research at the Start |
Our proposed research is aimed at reducing misunderstanding of emotional content of speech produced under the noisy condition. We will firstly learn modifications that the well-trained speakers modify their emotional speech when they are in the noisy environments. Then we will apply the modifications learned from well-trained speakers to less-trained speakers to make the less-trained speakers’ emotional speech in noise less confusable. Finally, we will extend our study to enhance emotion of speech for any given speaker in the noisy environments.
|
Outline of Final Research Achievements |
Under the real-life condition, people often need to express their emotions with appropriate speech in the noisy environments. In the past year, we mainly explored to reduce misperceptions of the emotional content of speech in the noisy environments. We found that VQ-VAE-based speech waveforms typically have inappropriate prosodic structure. Thus we introduced an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with phoneme features. We have published a conference paper on this work. We have tried to convert the emotional speech in the clean environment to the emotional speech with Lombard effect under the VQVAE. We have also investigated various adversarial networks to improve the emotional intelligibility of the decoded speech.
|
Academic Significance and Societal Importance of the Research Achievements |
この作品は、騒がしい環境での感情表現を強化することにより、悪条件での人間のコミュニケーション効率を向上させます。 また、特定の話者に対して、ノイズに強い適切な感情的なスピーチを生成することもできます。
|
Report
(3 results)
Research Products
(13 results)