音環境の認識と理解のための革新的マイクロホンアレー基盤技術の深化

Research Project

Project/Area Number	23K28113
Project/Area Number (Other)	23H03423 (2023)
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund (2024) Single-year Grants (2023)
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Waseda University
Principal Investigator	牧野昭二早稲田大学, 理工学術院(情報生産システム研究科・センター), 特任教授 (60396190)
Co-Investigator(Kenkyū-buntansha)	山田武志筑波大学, システム情報系, 教授 (20312829) 猿渡洋東京大学, 大学院情報理工学系研究科, 教授 (30324974)
Project Period (FY)	2023-04-01 – 2026-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥18,590,000 (Direct Cost: ¥14,300,000、Indirect Cost: ¥4,290,000) Fiscal Year 2025: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000) Fiscal Year 2024: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000) Fiscal Year 2023: ¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)
Keywords	音響情報処理 / 音響信号処理 / マイクロホンアレー / 音環境の認識と理解
Outline of Research at the Start	1) 音源分離・雑音抑圧・残響除去の統合技術を開発する。 2) 音源分離と時間周波数スイッチングフィルタを融合させ、劣決定/優決定条件の全体を最適化した理論を構築する。 3) 弱ラベル・ラベル無し学習法を開発し、ビッグデータのラベル付けコストの大幅削減を達成する。
Outline of Annual Research Achievements	[検討項目１][音源分離・雑音抑圧・残響除去の統合および移動音源への対応] これまで異なるアプローチで個別に発展してきた音源分離・雑音抑圧・残響除去を同一の誤差関数の最小化問題として統合し、同時最適化する統合手法を開発した。そして、時間周波数スイッチングフィルタを用いたマイクロホンアレーに対応できるように一般化し、更に強力な最適化規範を導入した。今期は、雑誌論文４件、国際会議発表４件、および国内大会発表２件の研究成果を得た。 [検討項目２][時間周波数スイッチングフィルタの複素フィルタとしての新展開] 時間周波数スイッチングビームフォーマは、複素フィルタを用いてマイク間の位相差を制御して空間指向性を形成していた。この概念を音源分離・雑音抑圧・残響除去に展開した時間周波数スイッチングフィルタにより、音源数に依存することなく高品質な出力を得るための統一的なアレー信号処理を検討した。時間周波数スイッチングフィルタを音源分離・雑音抑圧・残響除去の統合手法と融合させ、優決定問題 (音源数 < マイク数) と劣決定問題 (マイク数 < 音源数) の両方に対応できる手法を構築した。今期は、雑誌論文２件および国際会議発表１件の研究成果を得た。 [検討項目３][音環境の理解およびユニバーサル・サウンド・セパレーション] ロボットの耳などを想定し、世界中にあるすべての音を対象とするユニバーサル・サウンド・セパレーションを検討した。強調された音源信号から抽出した特徴量に基づき、音環境を解析・理解した。分類精度を向上させるために、言語情報と音情報のマルチモーダルの活用法や敵対的生成ネットワーク(Generative Adversarial Networks: GAN)などの深層学習の最新の音声認識技術も検討した。今期は、国際会議発表２件および国内大会発表３件の研究成果を得た。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 研究は順調に進展し、雑誌論文６件、国際会議発表７件、国内大会発表５件の研究成果を得た。
Strategy for Future Research Activity	[検討項目１][音源分離・雑音抑圧・残響除去の統合および移動音源への対応] これまで異なるアプローチで個別に発展してきた音源分離・雑音抑圧・残響除去を同一の誤差関数の最小化問題として統合し、同時最適化する統合手法を開発する。そして、時間周波数スイッチングフィルタを用いたマイクロホンアレーに対応できるように一般化し、更に強力な最適化規範を導入する。更に、演算量を削減しながら、性能を最適化するための実時間動作実現手法も検討する。 [検討項目２][時間周波数スイッチングフィルタの複素フィルタとしての新展開] 時間周波数スイッチングビームフォーマは、複素フィルタを用いてマイク間の位相差を制御して空間指向性を形成していた。この概念を音源分離・雑音抑圧・残響除去に展開した時間周波数スイッチングフィルタにより、音源数に依存することなく高品質な出力を得るための統一的なアレー信号処理を検討する。時間周波数スイッチングフィルタを音源分離・雑音抑圧・残響除去の統合手法と融合させ、優決定問題 (音源数 < マイク数) と劣決定問題 (マイク数 < 音源数) の両方に対応できる手法を構築する。 [検討項目３][音環境の理解およびユニバーサル・サウンド・セパレーション] ロボットの耳などを想定し、世界中にあるすべての音を対象とするユニバーサル・サウンド・セパレーションを検討する。強調された音源信号から抽出した特徴量に基づき、音環境を解析・理解する。音源信号に関する先見知識を利用し、音源の種類の増大に対処するため、言葉によるラベルを与えて補助する方式を検討する。特徴量次元での分類法も利用する。分類精度を向上させるために、言語情報と音情報のマルチモーダルの活用法や敵対的生成ネットワーク(Generative Adversarial Networks: GAN)などの深層学習の最新の音声認識技術も検討する。

Report

(1 results)

2023 Annual Research Report

Research Products
(18 results)

All 2024 2023

All Journal Article (6 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 6 results, Open Access: 6 results) Presentation (12 results) (of which Int'l Joint Research: 7 results, Invited: 2 results)

[Journal Article] Blind and spatially-regularized online joint optimization of source separation, dereverberation, and noise reduction2024
- Author(s)
  T. Ueda, T. Nakatani, R. Ikeshita, K. Kinoshita, S. Araki, and S. Makino
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech and Language Processing
  
  Volume: vol. 32 Pages: 1157-1172
- DOI
  10.1109/taslp.2024.3351353
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Real-Time Moving Blind Source Extraction Based on Constant Separating Vector and Auxiliary Function Technique2023
- Author(s)
  S. Yuan, T. Ueda, and S. Makino
- Journal Title
  
  Journal of Signal Processing
  
  Volume: 27 Issue: 4 Pages: 81-85
- DOI
  10.2299/jsp.27.81
- ISSN
  1342-6230, 1880-1013
- Year and Date
  2023-07-01
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation2023
- Author(s)
  M. Qin, L. Li, and S. Makino
- Journal Title
  
  Journal of Signal Processing
  
  Volume: 27 Issue: 4 Pages: 87-91
- DOI
  10.2299/jsp.27.87
- ISSN
  1342-6230, 1880-1013
- Year and Date
  2023-07-01
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Audio signal processing in the 21st century2023
- Author(s)
  G. Richard, P. Smaragdis, S. Gannot, P. Naylor, S. Makino, W. Kellermann, and A. Sugiyama
- Journal Title
  
  IEEE Signal Processing Magazine
  
  Volume: vol. 40 Issue: 5 Pages: 12-26
- DOI
  10.1109/msp.2023.3276171
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Wavelength-proportional interpolation and extrapolation of virtual microphone for underdetermined speech enhancement2023
- Author(s)
  R. Jinzai, K. Yamaoka, S. Makino, N. Ono, M. Matsumoto, and T. Yamada
- Journal Title
  
  APSIPA Trans. Signal and Information Processing
  
  Volume: vol. 12 Issue: 3 Pages: 1-22
- DOI
  10.1561/116.00000078
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Virtual microphone technique for binauralization for multiple sound images on 2-channel stereo signals detected by microphones mounted closely2023
- Author(s)
  R. Jinzai, K. Yamaoka, S. Makino, N. Ono, T. Yamada, and M. Matsumoto
- Journal Title
  
  APSIPA Trans. Signal and Information Processing
  
  Volume: vol. 12 Issue: 1 Pages: 1-21
- DOI
  10.1561/116.00000079
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] Moving interference speaker removal using geometrically constrained independent vector analysis2023
- Author(s)
  S. Furunaga, T. Ueda, and S. Makino
- Organizer
  Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Spatially-regularized switching independent vector analysis2023
- Author(s)
  T. Ueda, T. Nakatani, R. Ikeshita, S. Araki, and S. Makino
- Organizer
  Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] On joint dereverberation and source separation with geometrical constraints and iterative source steering2023
- Author(s)
  K. Mo, X. Wang, Y. Yang, T. Ueda, S. Makino, and J. Chen
- Organizer
  Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Geometrically constrained blind moving source extraction based on constant separation vector and auxiliary function technique2023
- Author(s)
  R. Zhang, T. Ueda, and S. Makino
- Organizer
  Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Enhancing spectrogram for audio classification using time-frequency enhancer2023
- Author(s)
  H. Xing, S. Zhang, D. Takeuchi, D. Niizumi, N. Harada, and S. Makino
- Organizer
  Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Constant separating vector-based blind source extraction and dereverberation for a moving speaker2023
- Author(s)
  T. Ueda and S. Makino
- Organizer
  EUSIPCO
- Related Report
  2023 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Masked modeling duo vision transformer with multi-layer feature fusion on respiratory sound classification2023
- Author(s)
  B. Liu, S. Zhang, D. Takeuchi, D. Niizumi, N. Harada, and S. Makino
- Organizer
  Detection and Classification of Acoustic Scenes and Events (DCASE)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] 拡散モデルに基づく音声強調の複数マイク化と評価2023
- Author(s)
  木村凜乃, 中谷智広, 加茂直之, Marc Delcroix, 荒木章子, 牧野昭二
- Organizer
  日本音響学会 2024年春　季研究発表会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] Geometrically constrained moving interference suppression with estimated moving range2023
- Author(s)
  M. Song, T. Ueda, and S. Makino
- Organizer
  日本音響学会 2023年秋季研究発表会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] Phase derivative aware speech enhancement2023
- Author(s)
  S. Zhang, D. Takeuchi, D. Niizumi, N. Harada, and S. Makino
- Organizer
  日本音響学会 2023年秋季研究発表会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] Gated multi mini-patch extractor for pooling in audio classification2023
- Author(s)
  B. He, S. Zhang, Z. Qiu, D. Takeuchi, D. Niizumi, N. Harada, and S. Makino
- Organizer
  日本音響学会 2023年秋季研究発表会講演論文集
- Related Report
  2023 Annual Research Report
[Presentation] Time-frequency feature extractor applied for audio classification2023
- Author(s)
  H. Xing, S. Zhang, D. Takeuchi, D. Niizumi, N. Harada, and S. Makino
- Organizer
  日本音響学会 2023年秋季研究発表会講演論文集
- Related Report
  2023 Annual Research Report

音環境の認識と理解のための革新的マイクロホンアレー基盤技術の深化

Principal Investigator

牧野 昭二 早稲田大学, 理工学術院(情報生産システム研究科・センター), 特任教授 (60396190)

¥18,590,000 (Direct Cost: ¥14,300,000、Indirect Cost: ¥4,290,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Blind and spatially-regularized online joint optimization of source separation, dereverberation, and noise reduction2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Real-Time Moving Blind Source Extraction Based on Constant Separating Vector and Auxiliary Function Technique2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Deep Complex-Valued Neural Network-Based Triple-Path Mask and Steering Vector Estimation for Multichannel Target Speech Separation2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Audio signal processing in the 21st century2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Wavelength-proportional interpolation and extrapolation of virtual microphone for underdetermined speech enhancement2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Virtual microphone technique for binauralization for multiple sound images on 2-channel stereo signals detected by microphones mounted closely2023

Author(s)

Journal Title

DOI

Related Report

[Presentation] Moving interference speaker removal using geometrically constrained independent vector analysis2023

Author(s)

Organizer

Related Report

[Presentation] Spatially-regularized switching independent vector analysis2023

Author(s)

Organizer

Related Report

[Presentation] On joint dereverberation and source separation with geometrical constraints and iterative source steering2023

Author(s)

Organizer

Related Report

[Presentation] Geometrically constrained blind moving source extraction based on constant separation vector and auxiliary function technique2023

Author(s)

Organizer

Related Report

[Presentation] Enhancing spectrogram for audio classification using time-frequency enhancer2023

Author(s)

Organizer

Related Report

[Presentation] Constant separating vector-based blind source extraction and dereverberation for a moving speaker2023

Author(s)

Organizer

Related Report

[Presentation] Masked modeling duo vision transformer with multi-layer feature fusion on respiratory sound classification2023

Author(s)

Organizer

Related Report

[Presentation] 拡散モデルに基づく音声強調の複数マイク化と評価2023

Author(s)

Organizer

Related Report

[Presentation] Geometrically constrained moving interference suppression with estimated moving range2023

Author(s)

Organizer

Related Report

[Presentation] Phase derivative aware speech enhancement2023

Author(s)

牧野昭二早稲田大学, 理工学術院(情報生産システム研究科・センター), 特任教授 (60396190)