研究課題/領域番号 |
15K00243
|
研究機関 | 会津大学 |
研究代表者 |
MARKOV K 会津大学, コンピュータ理工学部, 上級准教授 (80394998)
|
研究分担者 |
松井 知子 統計数理研究所, モデリング研究系, 教授 (10370090)
|
研究期間 (年度) |
2015-04-01 – 2018-03-31
|
キーワード | Speech emotion / Music emotion / Neural Networks / Gaussian Process |
研究実績の概要 |
Emotions perceived by humans from speech as well as from music have common psychological ground and studies in this area can be conducted using both speech and music as emotion information source. Last year we experimented with speech and this year we decided to use music data for our research. There are several music databases with emotion labels available and we used the MediaEval EmotionInMusic corpus for our study. As in the case of speech, the emotion labels are given in terms of arousal and valence values which are continuously changing real numbers in the range [-1.0, 1.0]. Our preliminary experiments using Gaussian Processes and Kalman Filter gave somehow unsatisfactory results, which prompted us to build a new system using deep neural networks. We implemented emotion models with both feed-forward and recurrent networks and compared the results. As other similar studies have found, the recurrent DNNs can achieve better performance, but in our case this was true only for the Arousal, for valence the FF DNN was slightly better. Overall, the results don't contradict with other publications. In addition, we have experimented with DNNs models for recognition of noisy speech as well as using DNNs as model for combining spectral and articulatory information for phoneme recognition task. We achieved significant improvements in both cases and our findings were/are being published.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
Currently, we are extending our experiments and working on systems improvement in order to obtain better parameter setting, increase the amount of training data and to implement and experiment with several new neural network structures. We are also looking at some other possible practical applications of the methods and techniques we have developed during this project. This may include other types of audio signals, not only speech or music, with potential applications in automatic internet services and healthcare informatics.
|
今後の研究の推進方策 |
For the last year of this project, we plan to continue our investigations on advanced neural network architectures for better modeling of audio signals, especially speech and music. Looking at other high impact applications such as healthcare and internet services is also in our agenda. From the theoretical point of view, we intend to work on the links between Gaussian Processes and Neural networks, especially those with deep structures. Since GPs are considered as one layer NN with infinite number of nodes, building deep GP structures will have the potential of exceeding current DNN capabilities.
|