• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Speech factorization using multi-agent deep learning

Research Project

Project/Area Number 19H04133
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionTokyo Institute of Technology

Principal Investigator

Shinoda Koichi  東京工業大学, 情報理工学院, 教授 (10343097)

Co-Investigator(Kenkyū-buntansha) 井上 中順  東京工業大学, 情報理工学院, 准教授 (10733397)
岩野 公司  東京都市大学, メディア情報学部, 教授 (90323823)
宇都 有昭  東京工業大学, 情報理工学院, 助教 (90345356)
Project Period (FY) 2019-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥17,420,000 (Direct Cost: ¥13,400,000、Indirect Cost: ¥4,020,000)
Fiscal Year 2021: ¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Fiscal Year 2020: ¥5,590,000 (Direct Cost: ¥4,300,000、Indirect Cost: ¥1,290,000)
Fiscal Year 2019: ¥7,150,000 (Direct Cost: ¥5,500,000、Indirect Cost: ¥1,650,000)
Keywords深層学習 / 音声認識 / 話者認識 / 話者分離 / 感情認識
Outline of Research at the Start

音声に関する音声認識、音声合成、話者認識などの様々なタスクを担当するエージェントが互いに競争・協調・調整しながら個々のタスクを学習する、マルチエージェントによる深層学習基盤を構築する。個々のタスクに関わる音声因子の間の含有・排他・共有などの関係を用いて音声データを因子分解することにより、個々のタスクの性能を高める。マルチタスク学習に比べ、少量・非均一のデータでより高い性能を得ることを目標とする。

Outline of Final Research Achievements

We researched to provide a multi-agent deep learning infrastructure in which agents responsible for various tasks related to speech, such as speech recognition, speech synthesis, and speaker recognition, can learn individual tasks while competing, cooperating, and coordinating with each other. We achieved noise-tolerant speech separation by explicitly handling noise and including it as a separation target. In addition, using the results of speaker and speech recognition, we improved emotion recognition performance by separating speaker and phonological features from speech features.

Academic Significance and Societal Importance of the Research Achievements

音声には音韻性、話者性、感情、など様々な特徴が含まれているが、それらの特徴間の関係を陽にモデル化することにより、音声認識、話者認識、感情認識など様々なタスクの性能を向上させる方法論を提案し、その有効性を確認した。音声処理の多くの用途に応用が可能であり、すでに精神疾患の診断や、人間の性格の診断などに効果があることを確認している。また音声以外の画像など様々なメディアの処理においても有効であることが期待される。

Report

(4 results)
  • 2022 Final Research Report ( PDF )
  • 2021 Annual Research Report
  • 2020 Annual Research Report
  • 2019 Annual Research Report
  • Research Products

    (9 results)

All 2023 2022 2021 2020 2019

All Journal Article (3 results) (of which Peer Reviewed: 3 results,  Open Access: 1 results) Presentation (5 results) (of which Invited: 1 results) Book (1 results)

  • [Journal Article] Multimodal Emotion Recognition with High-Level Speech and Text Features2021

    • Author(s)
      Makiuchi Mariana Rodrigues、Uto Kuniaki、Shinoda Koichi
    • Journal Title

      2021Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

      Volume: 1 Pages: 350-357

    • DOI

      10.1109/asru51503.2021.9688036

    • NAID

      120007192305

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Noise-Tolerant Time-Domain Speech Separation with Noise Bases2021

    • Author(s)
      Kohei Ozamoto, Kuniaki Uto, Koji Iwano, Koichi Shinoda
    • Journal Title

      Proc. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

      Volume: 1 Pages: 624-629

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] A Modified Algorithm for Multiple Input Spectrogram Inversion2019

    • Author(s)
      Wang Dongxiao、Kameoka Hirokazu、Shinoda Koichi
    • Journal Title

      Proc. ISCA Interspeech2019

      Volume: 1 Pages: 4569-4573

    • DOI

      10.21437/interspeech.2019-3242

    • NAID

      120006766462

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed / Open Access
  • [Presentation] Personality Recognition on Dyadic Interactions with Representation Learning2023

    • Author(s)
      Nathania Nah, Takafumi Koshinaka,Koichi Shinoda
    • Organizer
      電子情報通信学会SP IPSJ-SLP EA SIP 研究会
    • Related Report
      2021 Annual Research Report
  • [Presentation] Noise-Tolerant Time-Domain Speech Separation with Noise Bases2021

    • Author(s)
      Kohei Ozamoto, Kuniaki Uto, Koji Iwano, Koichi Shinoda
    • Organizer
      Proc. 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
    • Related Report
      2020 Annual Research Report
  • [Presentation] eam Takoyaki submission for VoxCeleb Speaker Recognition Challenge 20202020

    • Author(s)
      Keisuke Ishikawa, Kuniaki Uto, Koji Iwano, Koichi Shinoda
    • Organizer
      The VoxSRC Workshop
    • Related Report
      2020 Annual Research Report
  • [Presentation] Co-design of ML and HPC for video understanding2020

    • Author(s)
      Koichi Shinoda
    • Organizer
      1st International Workshop on Deep Video Understanding (DVU 2020)
    • Related Report
      2020 Annual Research Report
    • Invited
  • [Presentation] Improving the robustness of multiple input spectrogram inversion2019

    • Author(s)
      Dongxiao Wang, Hirokazu Kameoka, Koichi Shinoda
    • Organizer
      日本音響学会2019年春季研究発表会
    • Related Report
      2019 Annual Research Report
  • [Book] 音声(下)2022

    • Author(s)
      日本音響学会、岩野 公司、河原 達也、篠田 浩一、伊藤 彰則、増村 亮、小川 哲司、駒谷 和範
    • Total Pages
      208
    • Publisher
      コロナ社
    • ISBN
      9784339013672
    • Related Report
      2021 Annual Research Report

URL: 

Published: 2019-04-18   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi