2021 Fiscal Year Annual Research Report

Development of multi-lingual speech-based emotion recognition system by using heterogeneous emotional speech corpus

Research Project

Project/Area Number	19K12059
Research Institution	National Institute of Advanced Industrial Science and Technology
Principal Investigator	李時旭国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (50415642)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	音声感情認識 / 音声信号処理 / 機械学習 / パターン認識 / 深層学習
Outline of Annual Research Achievements	本研究では、音声による人間と機械のより共感的なコミュニケーションを実現するため、異種言語の感情音声を用いて、特徴空間の最適化と汎化性能の高度化を図ってきた。音声に基づく感情認識タスクでは、人間の感情表現と受容が主観的であるとともに、言語・文化・世代などの環境的な要因によって変動しやすい問題を抱えている。この問題は機械学習におけるドメイン間の相違問題（domain shift）として知られている。本研究では、複数の異種言語から感情音声の普遍かつ汎用的な特徴空間の構築による問題解決を目指してきた。今年度は、日本語感情音声コーパスと英語感情音声コーパスを対象とする多言語音声感情認識において、ドメイン敵対的ニューラルネットワーク(domain adversarial neural network; DANN)をアンサンブルする手法を提案し、性能向上の成果が得られた。これは、個別システムではドメインへの依存性を低く抑えたDANNの認識性能が、補助タスクのない普通システム及びドメインへの依存性を強化したマルチタスク学習(multi-task learning; MTL)の性能より低い性能を示すが、複数システムを融合するアンサンブルによる性能は逆転的に高くなることである。即ち、特定タスクである感情以外の言語や性別などの情報を除去するDANNによってタスクに関連する情報も毀損されたが、アンサンブルによって複数言語に渡る共通因子が抽出でき、汎化と識別の両方の性能を兼ね備えた特徴空間が構築できたと考えられる。この研究成果は、google scholarのAcoustics & Sound分野におけるトップクラスの国際会議であるIEEE Automatic Speech Recognition and Understanding Workshop (ASRU2021)で採択され、発表を行った。

Research Products
(2 results)

All 2021

All Presentation (2 results) (of which Int'l Joint Research: 2 results)

[Presentation] ENSEMBLE OF DOMAIN ADVERSARIAL NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION2021
- Author(s)
  Shi-wook Lee
- Organizer
  IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU2021)
- Int'l Joint Research
[Presentation] Multiple Deep Learning Models and Architectures with Different Numbers of States Used to Improve Retrieval Accuracy of Query-by-Example2021
- Author(s)
  Kazuki Hatakeyama, Masahiro Nishino, Kazunori Kojima, Shi-wook Lee, Yoshiaki Itoh
- Organizer
  13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Int'l Joint Research

2021 Fiscal Year Annual Research Report

Development of multi-lingual speech-based emotion recognition system by using heterogeneous emotional speech corpus

Principal Investigator

李 時旭 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (50415642)

Research Products

[Presentation] ENSEMBLE OF DOMAIN ADVERSARIAL NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION2021

Author(s)

Organizer

[Presentation] Multiple Deep Learning Models and Architectures with Different Numbers of States Used to Improve Retrieval Accuracy of Query-by-Example2021

Author(s)

Organizer

李時旭国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (50415642)