Project/Area Number |
18K18069
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | The University of Electro-Communications |
Principal Investigator |
Nakashika Toru 電気通信大学, 大学院情報理工学研究科, 准教授 (90749920)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Project Status |
Completed (Fiscal Year 2020)
|
Budget Amount *help |
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2020: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2019: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2018: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
|
Keywords | 音声信号処理 / 機械学習 / 確率・統計 / 声質変換 / 感情音声変換 / 感情変換 |
Outline of Final Research Achievements |
In speech signal processing, few methods have been established to simultaneously perform multiple different tasks such as speaker recognition and emotion recognition. In this research, we focused on the Boltzmann machine, which has the property of representing the relationships between various factors with its high potential ability, and examined the effectiveness of simultaneously realizing speaker recognition, emotion recognition, speaker conversion, and emotion conversion with it. From the experimental results, it was found that speaker recognition, emotion recognition, speaker conversion, and emotion conversion can be achieved using only a Boltzmann machine. We also revealed that the Boltzmann machine that simultaneously represents speakers and emotions outperformed the Boltzmann machine that represents either speakers or emotions in recognition and voice conversion accuracy.
|
Academic Significance and Societal Importance of the Research Achievements |
本研究の実験結果は、エネルギー関数を適切に設計することで様々な特徴因子間の関係性を紐解くボルツマンマシンの有効性を示唆しており、意義のある研究成果であると考える。また副次的な研究成果として、複素数データを直接表現する変分オートエンコーダや、音声コミュニケーションにおける言語・生理・音響の連鎖を考慮したボルツマンマシンを用いた声質変換・音声認識のマルチタスク学習など、新たな手法の着想や知見を得ることもできた。
|