• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

A Universal Audio Understanding Model for Localization, Separation, and Classification of Various Sounds

Research Project

Project/Area Number 20K21813
Research Category

Grant-in-Aid for Challenging Research (Exploratory)

Allocation TypeMulti-year Fund
Review Section Medium-sized Section 61:Human informatics and related fields
Research InstitutionKyoto University

Principal Investigator

Yoshii Kazuyoshi  京都大学, 情報学研究科, 准教授 (20510001)

Project Period (FY) 2020-07-30 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)
Fiscal Year 2021: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
Fiscal Year 2020: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Keywords音響信号処理 / 音源分離 / 残響除去 / 深層学習 / 最尤推定 / 音声強調 / 音声認識
Outline of Research at the Start

本研究では、物理拘束付きの多様な音響信号の統一的な深層生成モデルの定式化と、その逆問題としての教師なし学習について取り組む。任意の空間/音源特性を表現可能なユニバーサル音響生成モデルを定式化し、自律的に各種条件の類型化を行うとともに、バックエンドタスク(音声認識・音響イベント検出)との統合を行う。

Outline of Final Research Achievements

Our goal is to formulate a universal audio understanding model for various kinds of sounds including speech, music, and environmental sounds. More specifically, we have improved the source and spatial models and the likelihood function of the state-of-the-art blind source separation (BSS) method called FastMNMF and achieved joint optimization of FastMNMF with separation and reverberation models. We also tackled integration of speech enhancement and recognition.

Academic Significance and Societal Importance of the Research Achievements

本研究を通じて、人間が持つ音理解能力の創発的な側面、すなわち、正解の教示を受けなくても、多様な音が重畳する実環境とのインタラクションを通じて、音を個別に理解する能力に対し、一定の構成論的説明と統計的エビデンスを与えることができた。技術的には、ペアデータを前提とした深層学習モデルの教師あり学習から脱却し、尤度最大化の枠組みに基づく教師なし学習を主軸とすることで、大規模な音響信号データ利用への道筋を開いた。

Report

(3 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • Research Products

    (12 results)

All 2021 2020

All Journal Article (5 results) (of which Int'l Joint Research: 3 results,  Peer Reviewed: 5 results,  Open Access: 4 results) Presentation (7 results) (of which Int'l Joint Research: 4 results)

  • [Journal Article] Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation2021

    • Author(s)
      Yoshiaki Bando, Kouhei Sekiguchi, Yoshiki Masuyama, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
    • Journal Title

      IEEE Signal Processing Letters

      Volume: 28 Pages: 1670-1674

    • DOI

      10.1109/lsp.2021.3101699

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images2021

    • Author(s)
      Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima
    • Journal Title

      Journal of Information Processing

      Volume: 29 Issue: 0 Pages: 406-423

    • DOI

      10.2197/ipsjjip.29.406

    • NAID

      130008038621

    • ISSN
      1882-6652
    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering2021

    • Author(s)
      Yicheng Du, Robin Scheibler, Masahito Togami, Kazuyoshi Yoshii, Tatsuya Kawahara
    • Journal Title

      IEEE Signal Processing Letters

      Volume: 29 Pages: 927-931

    • DOI

      10.1109/lsp.2021.3134939

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020

    • Author(s)
      Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 2610-2625

    • DOI

      10.1109/taslp.2020.3019181

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Flow-Based Independent Vector Analysis for Blind Source Separation2020

    • Author(s)
      Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii
    • Journal Title

      IEEE Signal Processing Letters

      Volume: 27 Pages: 2173-2177

    • DOI

      10.1109/lsp.2020.3039944

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation2021

    • Author(s)
      Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
    • Organizer
      Annual Conference of the International Speech Communication Association (Interspeech)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Gamma Process FastMNMF for Separating an Unknown Number of Sound Sources2021

    • Author(s)
      Yoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 変分自己符号化器を用いた距離学習による楽器音の音高・音色分離表現2021

    • Author(s)
      田中啓太郎, 錦見亮, 坂東宜昭, 吉井和佳, 森島繁生
    • Organizer
      情報処理学会 第131回音楽情報科学研究会
    • Related Report
      2021 Annual Research Report
  • [Presentation] Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Blind Source Separation and Dereverberation2021

    • Author(s)
      Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VEA-Based Metric Learning2021

    • Author(s)
      Keitaro Tanaka, Ryo Nishikimi, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] ARMA-FastMNMFに基づく同時的ブラインド音源分離・残響除去2021

    • Author(s)
      関口 航平, 坂東 宜昭, Aditya Arie Nugraha, Mathieu Fontaine, 吉井 和佳
    • Organizer
      日本音響学会 2021年春季研究発表会
    • Related Report
      2020 Research-status Report
  • [Presentation] NF-IVAに基づく線形時変型決定系ブラインド音源分離2021

    • Author(s)
      Aditya Arie Nugraha, 関口 航平, Mathieu Fontaine, 坂東 宜昭, 吉井 和佳
    • Organizer
      日本音響学会 2021年春季研究発表会
    • Related Report
      2020 Research-status Report

URL: 

Published: 2020-08-03   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi