Can we reduce misperceptions of emotional content of speech in the noisy environments?

Research Project

Project/Area Number	19K24373
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Multi-year Fund
Review Section	1002:Human informatics, applied informatics and related fields
Research Institution	National Institute of Informatics
Principal Investigator	Zhao Yi 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (10843162)
Project Period (FY)	2019-08-30 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000) Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	VQVAE / emotional enhancement / neural networks / voice conversion / Lombard speech / Adversarial network / emotion enhancement / speaker embedding / neural vocoder / F0 encoder / speech perception / Lombard effect / deep learning
Outline of Research at the Start	Our proposed research is aimed at reducing misunderstanding of emotional content of speech produced under the noisy condition. We will firstly learn modifications that the well-trained speakers modify their emotional speech when they are in the noisy environments. Then we will apply the modifications learned from well-trained speakers to less-trained speakers to make the less-trained speakers’ emotional speech in noise less confusable. Finally, we will extend our study to enhance emotion of speech for any given speaker in the noisy environments.
Outline of Final Research Achievements	Under the real-life condition, people often need to express their emotions with appropriate speech in the noisy environments. In the past year, we mainly explored to reduce misperceptions of the emotional content of speech in the noisy environments. We found that VQ-VAE-based speech waveforms typically have inappropriate prosodic structure. Thus we introduced an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with phoneme features. We have published a conference paper on this work. We have tried to convert the emotional speech in the clean environment to the emotional speech with Lombard effect under the VQVAE. We have also investigated various adversarial networks to improve the emotional intelligibility of the decoded speech.
Academic Significance and Societal Importance of the Research Achievements	この作品は、騒がしい環境での感情表現を強化することにより、悪条件での人間のコミュニケーション効率を向上させます。また、特定の話者に対して、ノイズに強い適切な感情的なスピーチを生成することもできます。

Report

(3 results)

2020 Annual Research Report Final Research Report ( PDF )
2019 Research-status Report

Research Products
(13 results)

All 2021 2020 2019 Other

All Int'l Joint Research (5 results) Journal Article (4 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 4 results, Open Access: 4 results) Presentation (2 results) (of which Invited: 2 results) Remarks (2 results)

[Int'l Joint Research] Massachusetts Institute of Technology(米国)
- Related Report
  2020 Annual Research Report
[Int'l Joint Research] University of Edinburgh(英国)
- Related Report
  2020 Annual Research Report
[Int'l Joint Research] National University of Singapore(シンガポール)
- Related Report
  2020 Annual Research Report
[Int'l Joint Research] USTC(中国)
- Related Report
  2020 Annual Research Report
[Int'l Joint Research] Aalto University(フィンランド)
- Related Report
  2019 Research-status Report
[Journal Article] Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction2020
- Author(s)
  Zhao Yi、Li Haoyu、Lai Cheng-I、Williams Jennifer、Cooper Erica、Yamagishi Junichi
- Journal Title
  
  Proc. Interspeech 2020
  
  Volume: 2020
- DOI
  10.21437/interspeech.2020-1615
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion2020
- Author(s)
  Zhao Yi, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, Tomoki Toda
- Journal Title
  
  Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020
  
  Volume: 2020 Pages: 80-98
- DOI
  10.21437/vcc_bc.2020-14
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions2020
- Author(s)
  Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Zhao Yi, Xiaohai Tian, Tomoki Toda
- Journal Title
  
  Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020
  
  Volume: 2020 Pages: 99-120
- DOI
  10.21437/vcc_bc.2020-15
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation2020
- Author(s)
  Yi Zhao ; Xin Wang ; Lauri Juvela ; Junichi Yamagishi
- Journal Title
  
  ICASSP 2020
  
  Volume: - Pages: 6269-6273
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Modeling and evaluation methods in current voice conversion tasks2021
- Author(s)
  Yi Zhao
- Organizer
  言語処理学会第27回年次大会
- Related Report
  2020 Annual Research Report
- Invited
[Presentation] Waveform loss-based acoustic modeling for text-to-speech synthesis and speech-to-musical sound transferring2019
- Author(s)
  Yi Zhao
- Organizer
  Seminar in National University of Singapore
- Related Report
  2019 Research-status Report
- Invited
[Remarks] Samples for emotional clean/noisy speech
- URL
  https://nii-yamagishilab.github.io/EmotionaLombardSpeech/
- Related Report
  2019 Research-status Report
[Remarks] Samples for neural waveform vocoders
- URL
  https://nii-yamagishilab.github.io/samples-nsf/neural-music.html
- Related Report
  2019 Research-status Report

Can we reduce misperceptions of emotional content of speech in the noisy environments?

Principal Investigator

Zhao Yi 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (10843162)

¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)

Report

Research Products

[Int'l Joint Research] Massachusetts Institute of Technology(米国)

Related Report

[Int'l Joint Research] University of Edinburgh(英国)

Related Report

[Int'l Joint Research] National University of Singapore(シンガポール)

Related Report

[Int'l Joint Research] USTC(中国)

Related Report

[Int'l Joint Research] Aalto University(フィンランド)

Related Report

[Journal Article] Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation2020

Author(s)

Journal Title

Related Report

[Presentation] Modeling and evaluation methods in current voice conversion tasks2021

Author(s)

Organizer

Related Report

[Presentation] Waveform loss-based acoustic modeling for text-to-speech synthesis and speech-to-musical sound transferring2019

Author(s)

Organizer

Related Report

[Remarks] Samples for emotional clean/noisy speech

URL

Related Report

[Remarks] Samples for neural waveform vocoders

URL

Related Report