2022 Fiscal Year Final Research Report
Development of speech enhancement methods for conveying emotions equivalent to face-to-face communication
Project/Area Number |
19K20618
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 90010:Design-related
|
Research Institution | The University of Electro-Communications |
Principal Investigator |
Kishida Takuya 電気通信大学, 大学院情報理工学研究科, 研究員 (80827907)
|
Project Period (FY) |
2019-04-01 – 2023-03-31
|
Keywords | 音声信号処理 / 機械学習 / 声質変換 / 感情音声変換 |
Outline of Final Research Achievements |
In the context of speech communication using communication technologies, accurately conveying paralinguistic information such as emotions, intentions, attitudes, and speaker identities becomes challenging due to the absence of visual and other relevant cues. In this study, we developed a neural network capable of modeling the relationship between paralinguistic information and acoustic features of speech. Our research focused on exploring techniques to convert and enhance speaker identities and emotions. By employing the Boltzmann machine and related models, we were able to propose several approaches. These include a method that enables speaker identity conversion between individuals not included in the model's training, a method that concurrently converts speaker identities and emotions, and a method that decomposes voice into factors, allowing for voice impression conversion through factor manipulation.
|
Free Research Field |
音声信号処理
|
Academic Significance and Societal Importance of the Research Achievements |
本研究で得られた実験結果は、ボルツマンマシンやその関連手法が音声の音響特徴量と非言語情報との関係を表現するのに有効であることを示している。また、画像生成分野で目覚ましい成功を挙げている拡散確率モデルを声質変換課題に適用することに関する研究成果や調査結果は、音声コミュニケーションで声質変換技術をより柔軟に利用するための新たな手法の着想や知見につながった。
|