2022 Fiscal Year Annual Research Report

A Unified Computational Model for Audio-Visual Recognition of Human Social Interaction

Research Project

Project/Area Number	20K19833
Research Institution	Institute of Physical and Chemical Research
Principal Investigator	Nugraha Aditya 国立研究開発法人理化学研究所, 革新知能統合研究センター, 研究員 (60858025)
Project Period (FY)	2020-04-01 – 2023-03-31
Keywords	Audio-visual processing / Smart glasses / Adaptive system / Blind source separation / Speech enhancement / Speech recognition / Neural spatial model / Normalizing flow
Outline of Annual Research Achievements	This study aims to formulate a probabilistic computational model of audio-visual information processing for understanding verbal communication in human social interactions. The model is based on the probabilistic local Gaussian model for a multichannel audio signal, which uses spectral parameters to portray source characteristics, and spatial parameters to represent source and sensor locations in an environment. Initially, we assumed the availability of high-resolution recordings using stationary sensors (microphones and cameras) as in roundtable discussion scenarios. However, we decided to shift our focus to tackling issues in recordings captured by non-stationary sensors of head-worn smartglasses. The users naturally move their heads and bodies in real-world scenarios, especially when interacting with multiple persons in a group. Thus, dealing with the recordings requires a highly adaptive system that is robust to noise and reverberation. In FY2020, we developed multiple deep spectral models, including one based on the speaker and phone disentanglement. In FY2021, we worked on deep spatial models, including one based on an integration of normalizing flow and state-of-the-art joint diagonalization techniques for spatial covariance matrices, and started to incorporate visual aspects into our audio-visual information processing. In FY2022, we continued working on adaptive visually-informed audio signal processing, in which probable speaker locations govern the spatial parameter optimization of audio source separation or speech enhancement for speech recognition purposes.

Research Products
(9 results)

All 2022 Other

All Journal Article (2 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 2 results, Open Access: 1 results) Presentation (6 results) (of which Int'l Joint Research: 6 results) Remarks (1 results)

[Journal Article] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation2022
- Author(s)
  Fontaine Mathieu、Sekiguchi Kouhei、Nugraha Aditya Arie、Bando Yoshiaki、Yoshii Kazuyoshi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 30 Pages: 1734～1748
- DOI
  10.1109/TASLP.2022.3172631
- Peer Reviewed / Int'l Joint Research
[Journal Article] Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation2022
- Author(s)
  Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Fontaine Mathieu、Yoshii Kazuyoshi、Kawahara Tatsuya
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 30 Pages: 2368～2382
- DOI
  10.1109/TASLP.2022.3190734
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation2022
- Author(s)
  Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization2022
- Author(s)
  Fontaine Mathieu、Di Carlo Diego、Sekiguchi Kouhei、Nugraha Aditya Arie、Bando Yoshiaki、Yoshii Kazuyoshi
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Int'l Joint Research
[Presentation] Joint Localization and Synchronization of Distributed Camera-Attached Microphone Arrays for Indoor Scene Analysis2022
- Author(s)
  Sumura Yoshiaki、Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Yoshii Kazuyoshi
- Organizer
  International Workshop on Acoustic Signal Enhancement (IWAENC)
- Int'l Joint Research
[Presentation] DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF2022
- Author(s)
  Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
- Organizer
  International Workshop on Acoustic Signal Enhancement (IWAENC)
- Int'l Joint Research
[Presentation] Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments2022
- Author(s)
  Du Yicheng、Nugraha Aditya Arie、Sekiguchi Kouhei、Bando Yoshiaki、Fontaine Mathieu、Yoshii Kazuyoshi
- Organizer
  Annual Conference of the International Speech Communication Association (Interspeech)
- Int'l Joint Research
[Presentation] Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments2022
- Author(s)
  Sekiguchi Kouhei、Nugraha Aditya Arie、Du Yicheng、Bando Yoshiaki、Fontaine Mathieu、Yoshii Kazuyoshi
- Organizer
  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Int'l Joint Research
[Remarks] Demo web page for NF-FastMNMF
- URL
  https://aanugraha.github.io/demo/nffastmnmf/

2022 Fiscal Year Annual Research Report

A Unified Computational Model for Audio-Visual Recognition of Human Social Interaction

Principal Investigator

Nugraha Aditya 国立研究開発法人理化学研究所, 革新知能統合研究センター, 研究員 (60858025)

Research Products

[Journal Article] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation2022

Author(s)

Journal Title

DOI

[Journal Article] Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation2022

Author(s)

Journal Title

DOI

[Presentation] Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation2022

Author(s)

Organizer

[Presentation] Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization2022

Author(s)

Organizer

[Presentation] Joint Localization and Synchronization of Distributed Camera-Attached Microphone Arrays for Indoor Scene Analysis2022

Author(s)

Organizer

[Presentation] DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF2022

Author(s)

Organizer

[Presentation] Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments2022

Author(s)

Organizer

[Presentation] Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments2022

Author(s)

Organizer

[Remarks] Demo web page for NF-FastMNMF

URL