2016 Fiscal Year Research-status Report
Speech based emotional and depressive mental state prediction using Gaussian Process state-space models
Project/Area Number |
15K00243
|
Research Institution | The University of Aizu |
Principal Investigator |
MARKOV K 会津大学, コンピュータ理工学部, 上級准教授 (80394998)
|
Co-Investigator(Kenkyū-buntansha) |
松井 知子 統計数理研究所, モデリング研究系, 教授 (10370090)
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Keywords | Speech emotion / Music emotion / Neural Networks / Gaussian Process |
Outline of Annual Research Achievements |
Emotions perceived by humans from speech as well as from music have common psychological ground and studies in this area can be conducted using both speech and music as emotion information source. Last year we experimented with speech and this year we decided to use music data for our research. There are several music databases with emotion labels available and we used the MediaEval EmotionInMusic corpus for our study. As in the case of speech, the emotion labels are given in terms of arousal and valence values which are continuously changing real numbers in the range [-1.0, 1.0]. Our preliminary experiments using Gaussian Processes and Kalman Filter gave somehow unsatisfactory results, which prompted us to build a new system using deep neural networks. We implemented emotion models with both feed-forward and recurrent networks and compared the results. As other similar studies have found, the recurrent DNNs can achieve better performance, but in our case this was true only for the Arousal, for valence the FF DNN was slightly better. Overall, the results don't contradict with other publications. In addition, we have experimented with DNNs models for recognition of noisy speech as well as using DNNs as model for combining spectral and articulatory information for phoneme recognition task. We achieved significant improvements in both cases and our findings were/are being published.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Currently, we are extending our experiments and working on systems improvement in order to obtain better parameter setting, increase the amount of training data and to implement and experiment with several new neural network structures. We are also looking at some other possible practical applications of the methods and techniques we have developed during this project. This may include other types of audio signals, not only speech or music, with potential applications in automatic internet services and healthcare informatics.
|
Strategy for Future Research Activity |
For the last year of this project, we plan to continue our investigations on advanced neural network architectures for better modeling of audio signals, especially speech and music. Looking at other high impact applications such as healthcare and internet services is also in our agenda. From the theoretical point of view, we intend to work on the links between Gaussian Processes and Neural networks, especially those with deep structures. Since GPs are considered as one layer NN with infinite number of nodes, building deep GP structures will have the potential of exceeding current DNN capabilities.
|