A Universal Audio Understanding Model for Localization, Separation, and Classification of Various Sounds

Research Project

Project/Area Number	20K21813
Research Category	Grant-in-Aid for Challenging Research (Exploratory)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	Kyoto University
Principal Investigator	Yoshii Kazuyoshi 京都大学, 情報学研究科, 准教授 (20510001)
Project Period (FY)	2020-07-30 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000) Fiscal Year 2021: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000) Fiscal Year 2020: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Keywords	音響信号処理 / 音源分離 / 残響除去 / 深層学習 / 最尤推定 / 音声強調 / 音声認識
Outline of Research at the Start	本研究では、物理拘束付きの多様な音響信号の統一的な深層生成モデルの定式化と、その逆問題としての教師なし学習について取り組む。任意の空間/音源特性を表現可能なユニバーサル音響生成モデルを定式化し、自律的に各種条件の類型化を行うとともに、バックエンドタスク（音声認識・音響イベント検出）との統合を行う。
Outline of Final Research Achievements	Our goal is to formulate a universal audio understanding model for various kinds of sounds including speech, music, and environmental sounds. More specifically, we have improved the source and spatial models and the likelihood function of the state-of-the-art blind source separation (BSS) method called FastMNMF and achieved joint optimization of FastMNMF with separation and reverberation models. We also tackled integration of speech enhancement and recognition.
Academic Significance and Societal Importance of the Research Achievements	本研究を通じて、人間が持つ音理解能力の創発的な側面、すなわち、正解の教示を受けなくても、多様な音が重畳する実環境とのインタラクションを通じて、音を個別に理解する能力に対し、一定の構成論的説明と統計的エビデンスを与えることができた。技術的には、ペアデータを前提とした深層学習モデルの教師あり学習から脱却し、尤度最大化の枠組みに基づく教師なし学習を主軸とすることで、大規模な音響信号データ利用への道筋を開いた。

Report

(3 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report

Research Products
(12 results)

All 2021 2020

All Journal Article (5 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 5 results, Open Access: 4 results) Presentation (7 results) (of which Int'l Joint Research: 4 results)

[Journal Article] Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation2021
- Author(s)
  Yoshiaki Bando, Kouhei Sekiguchi, Yoshiki Masuyama, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
- Journal Title
  
  IEEE Signal Processing Letters
  
  Volume: 28 Pages: 1670-1674
- DOI
  10.1109/lsp.2021.3101699
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images2021
- Author(s)
  Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima
- Journal Title
  
  Journal of Information Processing
  
  Volume: 29 Issue: 0 Pages: 406-423
- DOI
  10.2197/ipsjjip.29.406
- NAID
  130008038621
- ISSN
  1882-6652
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering2021
- Author(s)
  Yicheng Du, Robin Scheibler, Masahito Togami, Kazuyoshi Yoshii, Tatsuya Kawahara
- Journal Title
  
  IEEE Signal Processing Letters
  
  Volume: 29 Pages: 927-931
- DOI
  10.1109/lsp.2021.3134939
- Related Report
  2021 Annual Research Report
- Peer Reviewed
[Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 28 Pages: 2610-2625
- DOI
  10.1109/taslp.2020.3019181
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Flow-Based Independent Vector Analysis for Blind Source Separation2020
- Author(s)
  Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii
- Journal Title
  
  IEEE Signal Processing Letters
  
  Volume: 27 Pages: 2173-2177
- DOI
  10.1109/lsp.2020.3039944
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation2021
- Author(s)
  Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
- Organizer
  Annual Conference of the International Speech Communication Association (Interspeech)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Gamma Process FastMNMF for Separating an Unknown Number of Sound Sources2021
- Author(s)
  Yoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] 変分自己符号化器を用いた距離学習による楽器音の音高・音色分離表現2021
- Author(s)
  田中啓太郎, 錦見亮, 坂東宜昭, 吉井和佳, 森島繁生
- Organizer
  情報処理学会第131回音楽情報科学研究会
- Related Report
  2021 Annual Research Report
[Presentation] Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Blind Source Separation and Dereverberation2021
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VEA-Based Metric Learning2021
- Author(s)
  Keitaro Tanaka, Ryo Nishikimi, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] ARMA-FastMNMFに基づく同時的ブラインド音源分離・残響除去2021
- Author(s)
  関口航平, 坂東宜昭, Aditya Arie Nugraha, Mathieu Fontaine, 吉井和佳
- Organizer
  日本音響学会 2021年春季研究発表会
- Related Report
  2020 Research-status Report
[Presentation] NF-IVAに基づく線形時変型決定系ブラインド音源分離2021
- Author(s)
  Aditya Arie Nugraha, 関口航平, Mathieu Fontaine, 坂東宜昭, 吉井和佳
- Organizer
  日本音響学会 2021年春季研究発表会
- Related Report
  2020 Research-status Report

A Universal Audio Understanding Model for Localization, Separation, and Classification of Various Sounds

Principal Investigator

Yoshii Kazuyoshi 京都大学, 情報学研究科, 准教授 (20510001)

¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)

Report

Research Products

[Journal Article] Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images2021

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Journal Article] Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Flow-Based Independent Vector Analysis for Blind Source Separation2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation2021

Author(s)

Organizer

Related Report

[Presentation] Gamma Process FastMNMF for Separating an Unknown Number of Sound Sources2021

Author(s)

Organizer

Related Report

[Presentation] 変分自己符号化器を用いた距離学習による楽器音の音高・音色分離表現2021

Author(s)

Organizer

Related Report

[Presentation] Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Blind Source Separation and Dereverberation2021

Author(s)

Organizer

Related Report

[Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VEA-Based Metric Learning2021

Author(s)

Organizer

Related Report

[Presentation] ARMA-FastMNMFに基づく同時的ブラインド音源分離・残響除去2021

Author(s)

Organizer

Related Report

[Presentation] NF-IVAに基づく線形時変型決定系ブラインド音源分離2021

Author(s)

Organizer

Related Report