2016 年度実施状況報告書

Speech based emotional and depressive mental state prediction using Gaussian Process state-space models

研究課題

研究課題/領域番号	15K00243
研究機関	会津大学
研究代表者	MARKOV K 会津大学, コンピュータ理工学部, 上級准教授 (80394998)
研究分担者	松井知子統計数理研究所, モデリング研究系, 教授 (10370090)
研究期間 (年度)	2015-04-01 – 2018-03-31
キーワード	Speech emotion / Music emotion / Neural Networks / Gaussian Process
研究実績の概要	Emotions perceived by humans from speech as well as from music have common psychological ground and studies in this area can be conducted using both speech and music as emotion information source. Last year we experimented with speech and this year we decided to use music data for our research. There are several music databases with emotion labels available and we used the MediaEval EmotionInMusic corpus for our study. As in the case of speech, the emotion labels are given in terms of arousal and valence values which are continuously changing real numbers in the range [-1.0, 1.0]. Our preliminary experiments using Gaussian Processes and Kalman Filter gave somehow unsatisfactory results, which prompted us to build a new system using deep neural networks. We implemented emotion models with both feed-forward and recurrent networks and compared the results. As other similar studies have found, the recurrent DNNs can achieve better performance, but in our case this was true only for the Arousal, for valence the FF DNN was slightly better. Overall, the results don't contradict with other publications. In addition, we have experimented with DNNs models for recognition of noisy speech as well as using DNNs as model for combining spectral and articulatory information for phoneme recognition task. We achieved significant improvements in both cases and our findings were/are being published.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 Currently, we are extending our experiments and working on systems improvement in order to obtain better parameter setting, increase the amount of training data and to implement and experiment with several new neural network structures. We are also looking at some other possible practical applications of the methods and techniques we have developed during this project. This may include other types of audio signals, not only speech or music, with potential applications in automatic internet services and healthcare informatics.
今後の研究の推進方策	For the last year of this project, we plan to continue our investigations on advanced neural network architectures for better modeling of audio signals, especially speech and music. Looking at other high impact applications such as healthcare and internet services is also in our agenda. From the theoretical point of view, we intend to work on the links between Gaussian Processes and Neural networks, especially those with deep structures. Since GPs are considered as one layer NN with infinite number of nodes, building deep GP structures will have the potential of exceeding current DNN capabilities.

研究成果
(2件)

すべて 2016

すべて学会発表 (2件) (うち国際学会 2件)

[学会発表] Articulatory and Spectrum Features Integration using Generalized Distillation Framework2016
- 著者名/発表者名
  J.Yu, K.Markov, T.Matsui
- 学会等名
  IEEE Int. Workshop on Machine Learning for Signal Processing
- 発表場所
  Salerno, Italy
- 年月日
  2016-09-13 – 2016-09-16
- 国際学会
[学会発表] Robust Speech Recognition using Generalized Distillation Framework2016
- 著者名/発表者名
  K.Markov, T.Matsui
- 学会等名
  Interspeech
- 発表場所
  San Francisco, USA
- 年月日
  2016-09-08 – 2016-09-12
- 国際学会

2016 年度 実施状況報告書

Speech based emotional and depressive mental state prediction using Gaussian Process state-space models

研究代表者

MARKOV K 会津大学, コンピュータ理工学部, 上級准教授 (80394998)

現在までの達成度 (区分)

理由

研究成果

[学会発表] Articulatory and Spectrum Features Integration using Generalized Distillation Framework2016

著者名/発表者名

学会等名

発表場所

年月日

[学会発表] Robust Speech Recognition using Generalized Distillation Framework2016

著者名/発表者名

学会等名

発表場所

年月日

2016 年度実施状況報告書