2022 Fiscal Year Final Research Report

Research on acoustic scene analysis by integrating time-domain deep leraning and multiresolution analysis

Research Project

PDF

Project/Area Number	20K19818
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	The University of Tokyo
Principal Investigator	Nakamura Tomohiko 東京大学, 大学院情報理工学系研究科, 特任助教 (50866308)
Project Period (FY)	2020-04-01 – 2023-03-31
Keywords	音響情景分析 / 時間領域深層学習 / 多重解像度解析 / 音源分離 / 音響信号処理 / 深層学習 / 機械学習
Outline of Final Research Achievements	In this study, we proposed an audio source separation method, multiresolution deep-layered analysis. It comes from our finding that a waveform-domain audio source separation model, Wave-U-Net, resembles multiresolution analysis in downsampling (DS) architecture. Inspired by the resemblance, we developed a DS layer using the discrete wavelet transform. Music source separation experiments showed that the proposed method achieves higher separation performance than conventional waveform-based methods. We also extended the proposed layer so that its wavelets can be trained together with the other components of a deep neural network. This extension paves the way for obtaining suitable wavelets for target tasks in an end-to-end manner. Finally, we applied the proposed methods to monaural vocal ensemble separation and multi-channel audio source separation tasks and demonstrated the effectiveness of the proposed methods through experiments on these tasks.
Free Research Field	音響信号処理，音楽信号処理
Academic Significance and Societal Importance of the Research Achievements	本研究では，時間領域で直接分離を行う深層音源分離モデル（時間領域深層学習）と，信号処理・ウェーブレット解析で培われてきた多重解像度解析を融合する分野横断的方法論を創出した．時間領域深層学習では，高性能な音源分離を実現するように各構成要素のパラメータが学習されるため，各構成要素の機能は明確ではなかった．一方，多重解像度解析は，音源によって適切に設計する必要があるものの，機能が明確な構成要素を用いている．本研究成果は，両者を統合することで深層学習の高性能性と信号処理の高い解釈性を両立する第一歩となるものである．