2021 Fiscal Year Final Research Report

A Universal Audio Understanding Model for Localization, Separation, and Classification of Various Sounds

Research Project

PDF

Project/Area Number	20K21813
Research Category	Grant-in-Aid for Challenging Research (Exploratory)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	Kyoto University
Principal Investigator	Yoshii Kazuyoshi 京都大学, 情報学研究科, 准教授 (20510001)
Project Period (FY)	2020-07-30 – 2022-03-31
Keywords	音響信号処理 / 音源分離 / 残響除去 / 深層学習 / 最尤推定 / 音声強調 / 音声認識
Outline of Final Research Achievements	Our goal is to formulate a universal audio understanding model for various kinds of sounds including speech, music, and environmental sounds. More specifically, we have improved the source and spatial models and the likelihood function of the state-of-the-art blind source separation (BSS) method called FastMNMF and achieved joint optimization of FastMNMF with separation and reverberation models. We also tackled integration of speech enhancement and recognition.
Free Research Field	音響信号処理
Academic Significance and Societal Importance of the Research Achievements	本研究を通じて、人間が持つ音理解能力の創発的な側面、すなわち、正解の教示を受けなくても、多様な音が重畳する実環境とのインタラクションを通じて、音を個別に理解する能力に対し、一定の構成論的説明と統計的エビデンスを与えることができた。技術的には、ペアデータを前提とした深層学習モデルの教師あり学習から脱却し、尤度最大化の枠組みに基づく教師なし学習を主軸とすることで、大規模な音響信号データ利用への道筋を開いた。