2021 Fiscal Year Final Research Report
A Universal Audio Understanding Model for Localization, Separation, and Classification of Various Sounds
Project/Area Number |
20K21813
|
Research Category |
Grant-in-Aid for Challenging Research (Exploratory)
|
Allocation Type | Multi-year Fund |
Review Section |
Medium-sized Section 61:Human informatics and related fields
|
Research Institution | Kyoto University |
Principal Investigator |
|
Project Period (FY) |
2020-07-30 – 2022-03-31
|
Keywords | 音響信号処理 / 音源分離 / 残響除去 / 深層学習 / 最尤推定 / 音声強調 / 音声認識 |
Outline of Final Research Achievements |
Our goal is to formulate a universal audio understanding model for various kinds of sounds including speech, music, and environmental sounds. More specifically, we have improved the source and spatial models and the likelihood function of the state-of-the-art blind source separation (BSS) method called FastMNMF and achieved joint optimization of FastMNMF with separation and reverberation models. We also tackled integration of speech enhancement and recognition.
|
Free Research Field |
音響信号処理
|
Academic Significance and Societal Importance of the Research Achievements |
本研究を通じて、人間が持つ音理解能力の創発的な側面、すなわち、正解の教示を受けなくても、多様な音が重畳する実環境とのインタラクションを通じて、音を個別に理解する能力に対し、一定の構成論的説明と統計的エビデンスを与えることができた。技術的には、ペアデータを前提とした深層学習モデルの教師あり学習から脱却し、尤度最大化の枠組みに基づく教師なし学習を主軸とすることで、大規模な音響信号データ利用への道筋を開いた。
|