2022 Fiscal Year Final Research Report

Development of speech enhancement methods for conveying emotions equivalent to face-to-face communication

Research Project

PDF

Project/Area Number	19K20618
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 90010:Design-related
Research Institution	The University of Electro-Communications
Principal Investigator	Kishida Takuya 電気通信大学, 大学院情報理工学研究科, 研究員 (80827907)
Project Period (FY)	2019-04-01 – 2023-03-31
Keywords	音声信号処理 / 機械学習 / 声質変換 / 感情音声変換
Outline of Final Research Achievements	In the context of speech communication using communication technologies, accurately conveying paralinguistic information such as emotions, intentions, attitudes, and speaker identities becomes challenging due to the absence of visual and other relevant cues. In this study, we developed a neural network capable of modeling the relationship between paralinguistic information and acoustic features of speech. Our research focused on exploring techniques to convert and enhance speaker identities and emotions. By employing the Boltzmann machine and related models, we were able to propose several approaches. These include a method that enables speaker identity conversion between individuals not included in the model's training, a method that concurrently converts speaker identities and emotions, and a method that decomposes voice into factors, allowing for voice impression conversion through factor manipulation.
Free Research Field	音声信号処理
Academic Significance and Societal Importance of the Research Achievements	本研究で得られた実験結果は、ボルツマンマシンやその関連手法が音声の音響特徴量と非言語情報との関係を表現するのに有効であることを示している。また、画像生成分野で目覚ましい成功を挙げている拡散確率モデルを声質変換課題に適用することに関する研究成果や調査結果は、音声コミュニケーションで声質変換技術をより柔軟に利用するための新たな手法の着想や知見につながった。