2015 Fiscal Year Research-status Report
Speech based emotional and depressive mental state prediction using Gaussian Process state-space models
Project/Area Number |
15K00243
|
Research Institution | The University of Aizu |
Principal Investigator |
MARKOV K 会津大学, コンピュータ理工学部, 准教授 (80394998)
|
Co-Investigator(Kenkyū-buntansha) |
松井 知子 統計数理研究所, 大学共同利用機関等の部局等, 教授 (10370090)
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Keywords | Speech Emotion / Gaussian Process / State-Space Model / Particle filter |
Outline of Annual Research Achievements |
During the first year of this project we developed and two systems for estimation of the emotional state of a speaker based on hid/her speech. Both systems follow the state-space modeling framework. The first one, which serves as a baseline, utilizes linear state and measurement models and is known as Kalman Filter. The second one, uses Gaussian Process models and since there is no analytic solution for the inference problem, we adopted the Particle filter approach. For the evaluation experiments, we used the AVEC2014 database, which consists of recordings of 84 subjects. There are 100 recordings for model training and 100 recordings for evaluation. This database provides some features extracted from the speech signal using the openSMILE toolkit. Since the dimension of the features is too high, we have selected two subsets of 38 and 76 dimensions. The baseline Kalman Filter (KF) and the Gaussian Process (GP) particle filter systems were evaluated for emotion prediction accuracy in terms of Pearson correlation coefficient (R) and root mean squared error (RMSE). The obtained results for R are: KF - 0.088, GP - 0.164, and for RMSE: KF - 0.169, GP - 0.089. This is more than two times better results in both R and RMSE measures.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Currently, we are analyzing the results of our experiments and working on some improvements of the system in order to achieve even better emotion prediction performance. There are several directions where we expect to achieve this goal such as improved feature pre-processing, search for better proposal functions for the Particle filter as well as combining Gaussian Process models with other state-of-the-art modeling approaches.
|
Strategy for Future Research Activity |
For the future, we plan to research and develop an emotion recognition system where Gaussian Processes can be fused with Deep Neural Networks (DNN). DNNs have been proven to achieve very high performance on various classification and regression tasks and we expect that by combining the strengths of DNN and Gaussian Processes, we can develop a nigh performance system. DNNs can be incorporated in the state-space modeling framework as feature pre-processing module, as a measurement model or even as a tempral-measurement model. In this case, a recurrent DNN such as Long-Short Term Memory (LSTM) can be utilized. If possible, we would try to evaluate our systems on different databases in order to investigate the effect of data variation on the models and to prove that our methodology is effective for various kinds of languages.
|