2023 Fiscal Year Final Research Report

Innovation of speech / acoustic scene recognition based on distributed acoustic sensing and asynchronous sequence modeling

Research Project

PDF

Project/Area Number	20H00613
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	Tokyo Metropolitan University
Principal Investigator	Nobutaka Ono 東京都立大学, システムデザイン研究科, 教授 (80334259)
Co-Investigator(Kenkyū-buntansha)	須山章子 (荒木章子) 日本電信電話株式会社NTTコミュニケーション科学基礎研究所, メディア情報研究部, 主幹研究員 (30396212) 井本桂右同志社大学, 理工学部, 准教授 (90802116) 塩田さやか東京都立大学, システムデザイン研究科, 准教授 (90705039) 宮崎亮一徳山工業高等専門学校, 情報電子工学科, 准教授 (40734728) 貴家仁志東京都立大学, システムデザイン研究科, 教授 (40157110)
Project Period (FY)	2020-04-01 – 2024-03-31
Keywords	分散マイクロホンアレイ / 分散音響センシング / 音声認識 / 音響シーン認識 / 音源分離 / 同期 / 音光変換 / ブリンキー
Outline of Final Research Achievements	In this study, we developed efficient algorithms for high-precision time difference estimation and sampling frequency mismatch estimation and compensation as techniques for blindly synchronizing asynchronous signals. Additionally, we extended these techniques to applications such as acoustic object cancellers and impulse response estimation under sampling frequency variations. For multimodal acoustic sensing using sound-to-light conversion, we constructed various purpose-specific methods, including not only traditional intensity conversion but also melody visualization, speech estimation using small-scale DNNs, sparse spectrum reconstruction based on compressed sensing, and optimization for acoustic scene recognition through end-to-end learning. We also confirmed the effectiveness of spatial features derived from distributed sensing for acoustic scene recognition.
Free Research Field	音響信号処理
Academic Significance and Societal Importance of the Research Achievements	マイクロホンを分散配置し音響信号処理を行うには、従来は厳密な時間同期が必要であり、有線接続は煩雑な配線を、無線の利用は大きな帯域幅を必要とするなどの困難があった。これに対し本研究は、我々の身の回りにあるスマートフォン、モバイル端末などの複数の録音機器を観測信号のみから同期する手法を確立した。これにより分散録音機器をアレイ信号処理、具体的には音源分離、音源強調，空間情報の取得などに活用することが可能となった。これらは遠隔音声認識や音響シーン認識の性能向上に大きく貢献する。また音光変換とビデオカメラを用いた音響分散センシングの独自の枠組みを進展させ、音響シーン認識の新しい方向性を提示できた。