2015 年度実施状況報告書

Speech based emotional and depressive mental state prediction using Gaussian Process state-space models

研究課題

研究課題/領域番号	15K00243
研究機関	会津大学
研究代表者	MARKOV K 会津大学, コンピュータ理工学部, 准教授 (80394998)
研究分担者	松井知子統計数理研究所, 大学共同利用機関等の部局等, 教授 (10370090)
研究期間 (年度)	2015-04-01 – 2018-03-31
キーワード	Speech Emotion / Gaussian Process / State-Space Model / Particle filter
研究実績の概要	During the first year of this project we developed and two systems for estimation of the emotional state of a speaker based on hid/her speech. Both systems follow the state-space modeling framework. The first one, which serves as a baseline, utilizes linear state and measurement models and is known as Kalman Filter. The second one, uses Gaussian Process models and since there is no analytic solution for the inference problem, we adopted the Particle filter approach. For the evaluation experiments, we used the AVEC2014 database, which consists of recordings of 84 subjects. There are 100 recordings for model training and 100 recordings for evaluation. This database provides some features extracted from the speech signal using the openSMILE toolkit. Since the dimension of the features is too high, we have selected two subsets of 38 and 76 dimensions. The baseline Kalman Filter (KF) and the Gaussian Process (GP) particle filter systems were evaluated for emotion prediction accuracy in terms of Pearson correlation coefficient (R) and root mean squared error (RMSE). The obtained results for R are: KF - 0.088, GP - 0.164, and for RMSE: KF - 0.169, GP - 0.089. This is more than two times better results in both R and RMSE measures.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 Currently, we are analyzing the results of our experiments and working on some improvements of the system in order to achieve even better emotion prediction performance. There are several directions where we expect to achieve this goal such as improved feature pre-processing, search for better proposal functions for the Particle filter as well as combining Gaussian Process models with other state-of-the-art modeling approaches.
今後の研究の推進方策	For the future, we plan to research and develop an emotion recognition system where Gaussian Processes can be fused with Deep Neural Networks (DNN). DNNs have been proven to achieve very high performance on various classification and regression tasks and we expect that by combining the strengths of DNN and Gaussian Processes, we can develop a nigh performance system. DNNs can be incorporated in the state-space modeling framework as feature pre-processing module, as a measurement model or even as a tempral-measurement model. In this case, a recurrent DNN such as Long-Short Term Memory (LSTM) can be utilized. If possible, we would try to evaluate our systems on different databases in order to investigate the effect of data variation on the models and to prove that our methodology is effective for various kinds of languages.

研究成果
(2件)

すべて学会発表 (1件) 図書 (1件)

[学会発表] Dynamic Speech Emotion Recognition with State-Space Models2015
- 著者名/発表者名
  Konstantin Markov, Tomoko Matsui
- 学会等名
  European Signal Processing Conference
- 発表場所
  Nice, France
- 年月日
  2015-08-31 – 2015-09-04
[図書] Modern Methodology and Applications in Spatial-Temporal Modeling, Chapter 32015
- 著者名/発表者名
  Konstantin Markov, Tomoko Matsui
- 総ページ数
  109
- 出版者
  Springer