2022 Fiscal Year Final Research Report

Speech factorization using multi-agent deep learning

Research Project

PDF

Project/Area Number	19H04133
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Tokyo Institute of Technology
Principal Investigator	Shinoda Koichi 東京工業大学, 情報理工学院, 教授 (10343097)
Co-Investigator(Kenkyū-buntansha)	井上中順東京工業大学, 情報理工学院, 准教授 (10733397) 岩野公司東京都市大学, メディア情報学部, 教授 (90323823) 宇都有昭東京工業大学, 情報理工学院, 助教 (90345356)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	深層学習 / 音声認識 / 話者認識 / 話者分離 / 感情認識
Outline of Final Research Achievements	We researched to provide a multi-agent deep learning infrastructure in which agents responsible for various tasks related to speech, such as speech recognition, speech synthesis, and speaker recognition, can learn individual tasks while competing, cooperating, and coordinating with each other. We achieved noise-tolerant speech separation by explicitly handling noise and including it as a separation target. In addition, using the results of speaker and speech recognition, we improved emotion recognition performance by separating speaker and phonological features from speech features.
Free Research Field	機械学習
Academic Significance and Societal Importance of the Research Achievements	音声には音韻性、話者性、感情、など様々な特徴が含まれているが、それらの特徴間の関係を陽にモデル化することにより、音声認識、話者認識、感情認識など様々なタスクの性能を向上させる方法論を提案し、その有効性を確認した。音声処理の多くの用途に応用が可能であり、すでに精神疾患の診断や、人間の性格の診断などに効果があることを確認している。また音声以外の画像など様々なメディアの処理においても有効であることが期待される。