2020 Fiscal Year Annual Research Report
Can we reduce misperceptions of emotional content of speech in the noisy environments?
Project/Area Number |
19K24373
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Zhao Yi 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (10843162)
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Keywords | VQVAE / emotional enhancement / neural networks / voice conversion / Lombard speech / Adversarial network |
Outline of Annual Research Achievements |
Under the real-life condition, people often need to express their emotions with appropriate speech in the noisy environments. In the past year, we mainly explored to reduce misperceptions of the emotional content of speech in the noisy environments. We found that VQ-VAE-based speech waveforms typically have inappropriate prosodic structure. Thus we introduced an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with phone features. We have published a conference paper on this work. We have tried to convert the emotional speech in the clean environment to the emotional speech with Lombard effect under the VQVAE. We have also investigated various adversarial networks to improve the emotional intelligibility of the decoded speech.
|