研究実績の概要 |
During the first year of this project we developed and two systems for estimation of the emotional state of a speaker based on hid/her speech. Both systems follow the state-space modeling framework. The first one, which serves as a baseline, utilizes linear state and measurement models and is known as Kalman Filter. The second one, uses Gaussian Process models and since there is no analytic solution for the inference problem, we adopted the Particle filter approach. For the evaluation experiments, we used the AVEC2014 database, which consists of recordings of 84 subjects. There are 100 recordings for model training and 100 recordings for evaluation. This database provides some features extracted from the speech signal using the openSMILE toolkit. Since the dimension of the features is too high, we have selected two subsets of 38 and 76 dimensions. The baseline Kalman Filter (KF) and the Gaussian Process (GP) particle filter systems were evaluated for emotion prediction accuracy in terms of Pearson correlation coefficient (R) and root mean squared error (RMSE). The obtained results for R are: KF - 0.088, GP - 0.164, and for RMSE: KF - 0.169, GP - 0.089. This is more than two times better results in both R and RMSE measures.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
Currently, we are analyzing the results of our experiments and working on some improvements of the system in order to achieve even better emotion prediction performance. There are several directions where we expect to achieve this goal such as improved feature pre-processing, search for better proposal functions for the Particle filter as well as combining Gaussian Process models with other state-of-the-art modeling approaches.
|
今後の研究の推進方策 |
For the future, we plan to research and develop an emotion recognition system where Gaussian Processes can be fused with Deep Neural Networks (DNN). DNNs have been proven to achieve very high performance on various classification and regression tasks and we expect that by combining the strengths of DNN and Gaussian Processes, we can develop a nigh performance system. DNNs can be incorporated in the state-space modeling framework as feature pre-processing module, as a measurement model or even as a tempral-measurement model. In this case, a recurrent DNN such as Long-Short Term Memory (LSTM) can be utilized. If possible, we would try to evaluate our systems on different databases in order to investigate the effect of data variation on the models and to prove that our methodology is effective for various kinds of languages.
|