• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of multi-lingual speech-based emotion recognition system by using heterogeneous emotional speech corpus

Research Project

Project/Area Number 19K12059
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionNational Institute of Advanced Industrial Science and Technology

Principal Investigator

LEE SHI-WOOK  国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (50415642)

Project Period (FY) 2019-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2019: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords音声感情認識 / 音声信号処理 / 機械学習 / パターン認識 / 深層学習 / 感情認識
Outline of Research at the Start

本研究では、音声信号から言語的な意味と意図・意思・感情なとのパラ言語・非言語情報を統合できる音声に基づく感情認識技術の学術的な基盤研究を目的とする
人間は音声による感情を主観的に表現・収容する。また、現在までに開発された様々な言語の感情音声コーパスが異なる分類のカテゴリーを持っており、大規模な学習データを必要とする認識・分類タスクにおいては致命的な弱点となってきた。その一方、感情は言語の壁がないユニバーサル言語ともみなされる。文化面や言語面で非常に高い異種性を持つ日本語と英語の感情音声を対象として普遍的特徴を探求し汎用モデルを構築する試みが本研究の概要である。

Outline of Final Research Achievements

In this study, we were able to make a common speech emotion feature space between heterogeneous languages, Japanese and English, by constructing a system based on feature normalization and multi-task learning. Particularly in language-independent tasks of inputting Japanese speech into a system built entirely of English speech, the proposed triplet network provided a 35.61% performance improvement from 45.05% to 80.66%. We also proposed an ensemble method based on a domain adversarial neural network. For the individual system, the recognition performance of domain adversarial neural networks is lower than that of domain-dependent multi-task learning, but the performance of the proposed method using an ensemble method is reversibly higher.

Academic Significance and Societal Importance of the Research Achievements

実用化の成功が著しい音声認識分野のコーパスとは対照的に、感情音声は低資源問題とも言えるほど学習データが少ないため、実用化が未だに難解な問題であった。本研究は、多言語の感情音声コーパスから感情音声の普遍的特徴空間を構築することであり、感性コミュニケーションを実現するための核心的な研究課題として学術的な意義を持つ。また、言語、性別と感情の3つのタスクを同時に最適化するマルチタスク学習、 アンサンブル手法により、日本語と英語の両方の性能において単一システムの性能を超える多言語システムの性能が得られた研究成果は人間と共感するコミュニケーション機械の開発における社会的な意義が高いと言える。

Report

(4 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • Research Products

    (9 results)

All 2021 2020 2019

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (6 results) (of which Int'l Joint Research: 5 results)

  • [Journal Article] Frame-level Matching Method between Maximum Likelihood State Sequence of Spoken Query and Spoken Documents in Spoken Term Detection2020

    • Author(s)
      伊藤 慶明、岩崎 瑛太郎、金子 大祐、小嶋 和徳、李 時旭
    • Journal Title

      電子情報通信学会論文誌D 情報・システム

      Volume: J103-D Issue: 12 Pages: 919-928

    • DOI

      10.14923/transinfj.2020JDP7030

    • ISSN
      1880-4535, 1881-0225
    • Year and Date
      2020-12-01
    • Related Report
      2020 Research-status Report
    • Peer Reviewed
  • [Journal Article] 音声中の検索語検出におけるクエリの関連語を利用したリスコアリング方式2020

    • Author(s)
      丹治遥,小嶋和徳,李時旭,南條浩輝, 伊藤慶明
    • Journal Title

      情報処理学会論文誌

      Volume: 61 Pages: 103-112

    • NAID

      170000181608

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] ICASSP 20192019

    • Author(s)
      Shi-wook Lee
    • Journal Title

      IEEE Signal Processing Magazine

      Volume: 35 Issue: 4 Pages: 5881-5885

    • DOI

      10.1109/msp.2018.2834838

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Presentation] ENSEMBLE OF DOMAIN ADVERSARIAL NEURAL NETWORKS FOR SPEECH EMOTION RECOGNITION2021

    • Author(s)
      Shi-wook Lee
    • Organizer
      IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU2021)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Multiple Deep Learning Models and Architectures with Different Numbers of States Used to Improve Retrieval Accuracy of Query-by-Example2021

    • Author(s)
      Kazuki Hatakeyama, Masahiro Nishino, Kazunori Kojima, Shi-wook Lee, Yoshiaki Itoh
    • Organizer
      13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] DOMAIN GENERALIZATION WITH TRIPLET NETWORK FOR CROSS-CORPUS SPEECH EMOTION RECOGNITION2021

    • Author(s)
      Shi-wook Lee
    • Organizer
      2021 IEEE Spoken Language Technology Workshop (SLT)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] 異種・複数の深層学習モデルを用いた音声中の検索語検出方式の高精度・低メモリ化2021

    • Author(s)
      西野将弘,小嶋和徳,李時旭,伊藤慶明
    • Organizer
      日本音響学会春季研究発表会
    • Related Report
      2020 Research-status Report
  • [Presentation] Reduction of Speech Data Posteriorgrams by Compressing Maximum-likelihood State Sequences in Query by Example2020

    • Author(s)
      Takashi Yokota, Kazunori Kojima, Shi-wook Lee, Yoshiaki Itoh
    • Organizer
      APSIPA-ASC2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] A Rescoring Method Using Web Search and Word Vectors for Spoken Term Detection,2020

    • Author(s)
      H. Tanji, K. Kojima, H. Nanjo, S. Lee, and Y. Itoh
    • Organizer
      APSIPA-ASC2019
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2019-04-18   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi