2021 Fiscal Year Annual Research Report

あらゆる音の定位・分離・分類のためのユニバーサル音響理解モデル

Research Project

Project/Area Number	20K21813
Research Institution	Kyoto University
Principal Investigator	吉井和佳京都大学, 情報学研究科, 准教授 (20510001)
Project Period (FY)	2020-07-30 – 2022-03-31
Keywords	音響信号処理 / 音源分離 / 残響除去
Outline of Annual Research Achievements	2021年度は、これまで培ってきた、音源モデルと空間モデルを統合したユニバーサル音響生成モデルの定式化・推論法を洗練するとともに、音声認識との統合や音楽データ解析への応用に取り組んだ。具体的には、まず、音源数が未知の環境下において、深層音源モデルの生成モデルとしてガンマ過程を導入することにより、観測データの複雑さに応じて適切な個数の音源を推定可能な深層ノンパラメトリックベイズ音響生成モデルを考案した。また、残響に対する頑健性を控除するため、音響生成モデルにおいて、複素ガウス分布の代わりに音源の特性に応じた裾の重さを持つ複素安定分布を用いることにより、同時的ブラインド音源分離・残響除去法の性能改善に成功した。さらに、多チャネルスペクトログラムの深層生成モデルに対し、多チャネル音源分離のための深層推論モデルを導入することでVAEを構成し、両モデルを一挙に教師なし学習することを可能にした。これにより、高価なペアデータを用いずに、高速なオンライン推論を行う基礎技術を確立した。実際似、一連の基礎技術をもとに、音声強調と音声認識を統合したリアルタイム環境理解システムの開発にも着手した。音声データ解析以外への応用としては、未知の楽器も取り扱うことができる楽器音のユニバーサルな音源モデルとして、変分自己符号化器 (VAE) を用いて、楽器音を音高と音色とを潜在状態にもつ楽器音スペクトログラムの生成モデルを学習する方法を考案した。

Research Products
(8 results)

All 2021

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (5 results) (of which Int'l Joint Research: 4 results)

[Journal Article] Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation2021
- Author(s)
  Yoshiaki Bando, Kouhei Sekiguchi, Yoshiki Masuyama, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
- Journal Title
  
  IEEE Signal Processing Letters
  
  Volume: 28 Pages: 1670-1674
- DOI
  10.1109/lsp.2021.3101699
- Peer Reviewed
[Journal Article] MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images2021
- Author(s)
  Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima
- Journal Title
  
  Journal of Information Processing
  
  Volume: 29 Pages: 406-423
- DOI
  10.2197/ipsjjip.29.406
- Peer Reviewed
[Journal Article] Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering2021
- Author(s)
  Yicheng Du, Robin Scheibler, Masahito Togami, Kazuyoshi Yoshii, Tatsuya Kawahara
- Journal Title
  
  IEEE Signal Processing Letters
  
  Volume: 29 Pages: 927-931
- DOI
  10.1109/lsp.2021.3134939
- Peer Reviewed
[Presentation] Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation2021
- Author(s)
  Mathieu Fontaine, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
- Organizer
  Annual Conference of the International Speech Communication Association (Interspeech)
- Int'l Joint Research
[Presentation] Gamma Process FastMNMF for Separating an Unknown Number of Sound Sources2021
- Author(s)
  Yoshiaki Bando, Kouhei Sekiguchi, Kazuyoshi Yoshii
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Int'l Joint Research
[Presentation] 変分自己符号化器を用いた距離学習による楽器音の音高・音色分離表現2021
- Author(s)
  田中啓太郎, 錦見亮, 坂東宜昭, 吉井和佳, 森島繁生
- Organizer
  情報処理学会第131回音楽情報科学研究会
[Presentation] Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Blind Source Separation and Dereverberation2021
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VEA-Based Metric Learning2021
- Author(s)
  Keitaro Tanaka, Ryo Nishikimi, Yoshiaki Bando, Kazuyoshi Yoshii, Shigeo Morishima
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research

2021 Fiscal Year Annual Research Report

あらゆる音の定位・分離・分類のためのユニバーサル音響理解モデル

Principal Investigator

吉井 和佳 京都大学, 情報学研究科, 准教授 (20510001)

Research Products

[Journal Article] Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation2021

Author(s)

Journal Title

DOI

[Journal Article] MirrorNet: A Deep Reflective Approach to 2D Pose Estimation for Single-Person Images2021

Author(s)

Journal Title

DOI

[Journal Article] Computationally-Efficient Overdetermined Blind Source Separation Based on Iterative Source Steering2021

Author(s)

Journal Title

DOI

[Presentation] Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation2021

Author(s)

Organizer

[Presentation] Gamma Process FastMNMF for Separating an Unknown Number of Sound Sources2021

Author(s)

Organizer

[Presentation] 変分自己符号化器を用いた距離学習による楽器音の音高・音色分離表現2021

Author(s)

Organizer

[Presentation] Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Blind Source Separation and Dereverberation2021

Author(s)

Organizer

[Presentation] Pitch-Timbre Disentanglement of Musical Instrument Sounds Based on VEA-Based Metric Learning2021

Author(s)

Organizer

吉井和佳京都大学, 情報学研究科, 准教授 (20510001)