2015 Fiscal Year Research-status Report

Speech based emotional and depressive mental state prediction using Gaussian Process state-space models

Research Project

Project/Area Number	15K00243
Research Institution	The University of Aizu
Principal Investigator	MARKOV K 会津大学, コンピュータ理工学部, 准教授 (80394998)
Co-Investigator(Kenkyū-buntansha)	松井知子統計数理研究所, 大学共同利用機関等の部局等, 教授 (10370090)
Project Period (FY)	2015-04-01 – 2018-03-31
Keywords	Speech Emotion / Gaussian Process / State-Space Model / Particle filter
Outline of Annual Research Achievements	During the first year of this project we developed and two systems for estimation of the emotional state of a speaker based on hid/her speech. Both systems follow the state-space modeling framework. The first one, which serves as a baseline, utilizes linear state and measurement models and is known as Kalman Filter. The second one, uses Gaussian Process models and since there is no analytic solution for the inference problem, we adopted the Particle filter approach. For the evaluation experiments, we used the AVEC2014 database, which consists of recordings of 84 subjects. There are 100 recordings for model training and 100 recordings for evaluation. This database provides some features extracted from the speech signal using the openSMILE toolkit. Since the dimension of the features is too high, we have selected two subsets of 38 and 76 dimensions. The baseline Kalman Filter (KF) and the Gaussian Process (GP) particle filter systems were evaluated for emotion prediction accuracy in terms of Pearson correlation coefficient (R) and root mean squared error (RMSE). The obtained results for R are: KF - 0.088, GP - 0.164, and for RMSE: KF - 0.169, GP - 0.089. This is more than two times better results in both R and RMSE measures.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason Currently, we are analyzing the results of our experiments and working on some improvements of the system in order to achieve even better emotion prediction performance. There are several directions where we expect to achieve this goal such as improved feature pre-processing, search for better proposal functions for the Particle filter as well as combining Gaussian Process models with other state-of-the-art modeling approaches.
Strategy for Future Research Activity	For the future, we plan to research and develop an emotion recognition system where Gaussian Processes can be fused with Deep Neural Networks (DNN). DNNs have been proven to achieve very high performance on various classification and regression tasks and we expect that by combining the strengths of DNN and Gaussian Processes, we can develop a nigh performance system. DNNs can be incorporated in the state-space modeling framework as feature pre-processing module, as a measurement model or even as a tempral-measurement model. In this case, a recurrent DNN such as Long-Short Term Memory (LSTM) can be utilized. If possible, we would try to evaluate our systems on different databases in order to investigate the effect of data variation on the models and to prove that our methodology is effective for various kinds of languages.

Research Products
(2 results)

All Presentation (1 results) Book (1 results)

[Presentation] Dynamic Speech Emotion Recognition with State-Space Models2015
- Author(s)
  Konstantin Markov, Tomoko Matsui
- Organizer
  European Signal Processing Conference
- Place of Presentation
  Nice, France
- Year and Date
  2015-08-31 – 2015-09-04
[Book] Modern Methodology and Applications in Spatial-Temporal Modeling, Chapter 32015
- Author(s)
  Konstantin Markov, Tomoko Matsui
- Total Pages
  109
- Publisher
  Springer