• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2022 Fiscal Year Final Research Report

Development of speech enhancement methods for conveying emotions equivalent to face-to-face communication

Research Project

  • PDF
Project/Area Number 19K20618
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 90010:Design-related
Research InstitutionThe University of Electro-Communications

Principal Investigator

Kishida Takuya  電気通信大学, 大学院情報理工学研究科, 研究員 (80827907)

Project Period (FY) 2019-04-01 – 2023-03-31
Keywords音声信号処理 / 機械学習 / 声質変換 / 感情音声変換
Outline of Final Research Achievements

In the context of speech communication using communication technologies, accurately conveying paralinguistic information such as emotions, intentions, attitudes, and speaker identities becomes challenging due to the absence of visual and other relevant cues. In this study, we developed a neural network capable of modeling the relationship between paralinguistic information and acoustic features of speech. Our research focused on exploring techniques to convert and enhance speaker identities and emotions. By employing the Boltzmann machine and related models, we were able to propose several approaches. These include a method that enables speaker identity conversion between individuals not included in the model's training, a method that concurrently converts speaker identities and emotions, and a method that decomposes voice into factors, allowing for voice impression conversion through factor manipulation.

Free Research Field

音声信号処理

Academic Significance and Societal Importance of the Research Achievements

本研究で得られた実験結果は、ボルツマンマシンやその関連手法が音声の音響特徴量と非言語情報との関係を表現するのに有効であることを示している。また、画像生成分野で目覚ましい成功を挙げている拡散確率モデルを声質変換課題に適用することに関する研究成果や調査結果は、音声コミュニケーションで声質変換技術をより柔軟に利用するための新たな手法の着想や知見につながった。

URL: 

Published: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi