• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2015 Fiscal Year Research-status Report

Speech based emotional and depressive mental state prediction using Gaussian Process state-space models

Research Project

Project/Area Number 15K00243
Research InstitutionThe University of Aizu

Principal Investigator

MARKOV K  会津大学, コンピュータ理工学部, 准教授 (80394998)

Co-Investigator(Kenkyū-buntansha) 松井 知子  統計数理研究所, 大学共同利用機関等の部局等, 教授 (10370090)
Project Period (FY) 2015-04-01 – 2018-03-31
KeywordsSpeech Emotion / Gaussian Process / State-Space Model / Particle filter
Outline of Annual Research Achievements

During the first year of this project we developed and two systems for estimation of the emotional state of a speaker based on hid/her speech. Both systems follow the state-space modeling framework. The first one, which serves as a baseline, utilizes linear state and measurement models and is known as Kalman Filter. The second one, uses Gaussian Process models and since there is no analytic solution for the inference problem, we adopted the Particle filter approach.
For the evaluation experiments, we used the AVEC2014 database, which consists of recordings of 84 subjects. There are 100 recordings for model training and 100 recordings for evaluation. This database provides some features extracted from the speech signal using the openSMILE toolkit. Since the dimension of the features is too high, we have selected two subsets of 38 and 76 dimensions. The baseline Kalman Filter (KF) and the Gaussian Process (GP) particle filter systems were evaluated for emotion prediction accuracy in terms of Pearson correlation coefficient (R) and root mean squared error (RMSE).
The obtained results for R are: KF - 0.088, GP - 0.164, and for RMSE: KF - 0.169, GP - 0.089. This is more than two times better results in both R and RMSE measures.

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

Currently, we are analyzing the results of our experiments and working on some improvements of the system in order to achieve even better emotion prediction performance. There are several directions where we expect to achieve this goal such as improved feature pre-processing, search for better proposal functions for the Particle filter as well as combining Gaussian Process models with other state-of-the-art modeling approaches.

Strategy for Future Research Activity

For the future, we plan to research and develop an emotion recognition system where Gaussian Processes can be fused with Deep Neural Networks (DNN). DNNs have
been proven to achieve very high performance on various classification and regression tasks and we expect that by combining the strengths of DNN and Gaussian Processes, we can develop a nigh performance system. DNNs can be incorporated in the state-space modeling framework as feature pre-processing module, as a measurement model or even as a tempral-measurement model. In this case, a recurrent DNN such as Long-Short Term Memory (LSTM) can be utilized. If possible, we would try to evaluate our systems on different databases in order to investigate the effect of data variation on the models and to prove that our methodology is effective for various kinds of languages.

  • Research Products

    (2 results)

All 2015

All Presentation (1 results) Book (1 results)

  • [Presentation] Dynamic Speech Emotion Recognition with State-Space Models2015

    • Author(s)
      Konstantin Markov, Tomoko Matsui
    • Organizer
      European Signal Processing Conference
    • Place of Presentation
      Nice, France
    • Year and Date
      2015-08-31 – 2015-09-04
  • [Book] Modern Methodology and Applications in Spatial-Temporal Modeling, Chapter 32015

    • Author(s)
      Konstantin Markov, Tomoko Matsui
    • Total Pages
      109
    • Publisher
      Springer

URL: 

Published: 2017-01-06  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi