• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Can we reduce misperceptions of emotional content of speech in the noisy environments?

Research Project

Project/Area Number 19K24373
Research Category

Grant-in-Aid for Research Activity Start-up

Allocation TypeMulti-year Fund
Review Section 1002:Human informatics, applied informatics and related fields
Research InstitutionNational Institute of Informatics

Principal Investigator

Zhao Yi  国立情報学研究所, コンテンツ科学研究系, 特任研究員 (10843162)

Project Period (FY) 2019-08-30 – 2021-03-31
Project Status Completed (Fiscal Year 2020)
Budget Amount *help
¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
KeywordsVQVAE / emotional enhancement / neural networks / voice conversion / Lombard speech / Adversarial network / emotion enhancement / speaker embedding / neural vocoder / F0 encoder / speech perception / Lombard effect / deep learning
Outline of Research at the Start

Our proposed research is aimed at reducing misunderstanding of emotional content of speech produced under the noisy condition. We will firstly learn modifications that the well-trained speakers modify their emotional speech when they are in the noisy environments. Then we will apply the modifications learned from well-trained speakers to less-trained speakers to make the less-trained speakers’ emotional speech in noise less confusable. Finally, we will extend our study to enhance emotion of speech for any given speaker in the noisy environments.

Outline of Final Research Achievements

Under the real-life condition, people often need to express their emotions with appropriate speech in the noisy environments. In the past year, we mainly explored to reduce misperceptions of the emotional content of speech in the noisy environments. We found that VQ-VAE-based speech waveforms typically have inappropriate prosodic structure. Thus we introduced an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with phoneme features. We have published a conference paper on this work. We have tried to convert the emotional speech in the clean environment to the emotional speech with Lombard effect under the VQVAE. We have also investigated various adversarial networks to improve the emotional intelligibility of the decoded speech.

Academic Significance and Societal Importance of the Research Achievements

この作品は、騒がしい環境での感情表現を強化することにより、悪条件での人間のコミュニケーション効率を向上させます。 また、特定の話者に対して、ノイズに強い適切な感情的なスピーチを生成することもできます。

Report

(3 results)
  • 2020 Annual Research Report   Final Research Report ( PDF )
  • 2019 Research-status Report
  • Research Products

    (13 results)

All 2021 2020 2019 Other

All Int'l Joint Research (5 results) Journal Article (4 results) (of which Int'l Joint Research: 4 results,  Peer Reviewed: 4 results,  Open Access: 4 results) Presentation (2 results) (of which Invited: 2 results) Remarks (2 results)

  • [Int'l Joint Research] Massachusetts Institute of Technology(米国)

    • Related Report
      2020 Annual Research Report
  • [Int'l Joint Research] University of Edinburgh(英国)

    • Related Report
      2020 Annual Research Report
  • [Int'l Joint Research] National University of Singapore(シンガポール)

    • Related Report
      2020 Annual Research Report
  • [Int'l Joint Research] USTC(中国)

    • Related Report
      2020 Annual Research Report
  • [Int'l Joint Research] Aalto University(フィンランド)

    • Related Report
      2019 Research-status Report
  • [Journal Article] Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction2020

    • Author(s)
      Zhao Yi、Li Haoyu、Lai Cheng-I、Williams Jennifer、Cooper Erica、Yamagishi Junichi
    • Journal Title

      Proc. Interspeech 2020

      Volume: 2020

    • DOI

      10.21437/interspeech.2020-1615

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion2020

    • Author(s)
      Zhao Yi, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhen-Hua Ling, Tomoki Toda
    • Journal Title

      Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

      Volume: 2020 Pages: 80-98

    • DOI

      10.21437/vcc_bc.2020-14

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions2020

    • Author(s)
      Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhen-Hua Ling, Junichi Yamagishi, Zhao Yi, Xiaohai Tian, Tomoki Toda
    • Journal Title

      Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

      Volume: 2020 Pages: 99-120

    • DOI

      10.21437/vcc_bc.2020-15

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Transferring Neural Speech Waveform Synthesizers to Musical Instrument Sounds Generation2020

    • Author(s)
      Yi Zhao ; Xin Wang ; Lauri Juvela ; Junichi Yamagishi
    • Journal Title

      ICASSP 2020

      Volume: - Pages: 6269-6273

    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Modeling and evaluation methods in current voice conversion tasks2021

    • Author(s)
      Yi Zhao
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Annual Research Report
    • Invited
  • [Presentation] Waveform loss-based acoustic modeling for text-to-speech synthesis and speech-to-musical sound transferring2019

    • Author(s)
      Yi Zhao
    • Organizer
      Seminar in National University of Singapore
    • Related Report
      2019 Research-status Report
    • Invited
  • [Remarks] Samples for emotional clean/noisy speech

    • URL

      https://nii-yamagishilab.github.io/EmotionaLombardSpeech/

    • Related Report
      2019 Research-status Report
  • [Remarks] Samples for neural waveform vocoders

    • URL

      https://nii-yamagishilab.github.io/samples-nsf/neural-music.html

    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-09-03   Modified: 2022-01-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi