• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2022 Fiscal Year Annual Research Report

A Unified Computational Model for Audio-Visual Recognition of Human Social Interaction

Research Project

Project/Area Number 20K19833
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

Nugraha Aditya  国立研究開発法人理化学研究所, 革新知能統合研究センター, 研究員 (60858025)

Project Period (FY) 2020-04-01 – 2023-03-31
KeywordsAudio-visual processing / Smart glasses / Adaptive system / Blind source separation / Speech enhancement / Speech recognition / Neural spatial model / Normalizing flow
Outline of Annual Research Achievements

This study aims to formulate a probabilistic computational model of audio-visual information processing for understanding verbal communication in human social interactions. The model is based on the probabilistic local Gaussian model for a multichannel audio signal, which uses spectral parameters to portray source characteristics, and spatial parameters to represent source and sensor locations in an environment. Initially, we assumed the availability of high-resolution recordings using stationary sensors (microphones and cameras) as in roundtable discussion scenarios. However, we decided to shift our focus to tackling issues in recordings captured by non-stationary sensors of head-worn smartglasses. The users naturally move their heads and bodies in real-world scenarios, especially when interacting with multiple persons in a group. Thus, dealing with the recordings requires a highly adaptive system that is robust to noise and reverberation. In FY2020, we developed multiple deep spectral models, including one based on the speaker and phone disentanglement. In FY2021, we worked on deep spatial models, including one based on an integration of normalizing flow and state-of-the-art joint diagonalization techniques for spatial covariance matrices, and started to incorporate visual aspects into our audio-visual information processing. In FY2022, we continued working on adaptive visually-informed audio signal processing, in which probable speaker locations govern the spatial parameter optimization of audio source separation or speech enhancement for speech recognition purposes.

  • Research Products

    (9 results)

All 2022 Other

All Journal Article (2 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 2 results,  Open Access: 1 results) Presentation (6 results) (of which Int'l Joint Research: 6 results) Remarks (1 results)

  • [Journal Article] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation2022

    • Author(s)
      Fontaine Mathieu、Sekiguchi Kouhei、Nugraha Aditya Arie、Bando Yoshiaki、Yoshii Kazuyoshi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 30 Pages: 1734~1748

    • DOI

      10.1109/TASLP.2022.3172631

    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation2022

    • Author(s)
      Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Fontaine Mathieu、Yoshii Kazuyoshi、Kawahara Tatsuya
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 30 Pages: 2368~2382

    • DOI

      10.1109/TASLP.2022.3190734

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation2022

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Int'l Joint Research
  • [Presentation] Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization2022

    • Author(s)
      Fontaine Mathieu、Di Carlo Diego、Sekiguchi Kouhei、Nugraha Aditya Arie、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Int'l Joint Research
  • [Presentation] Joint Localization and Synchronization of Distributed Camera-Attached Microphone Arrays for Indoor Scene Analysis2022

    • Author(s)
      Sumura Yoshiaki、Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Yoshii Kazuyoshi
    • Organizer
      International Workshop on Acoustic Signal Enhancement (IWAENC)
    • Int'l Joint Research
  • [Presentation] DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF2022

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      International Workshop on Acoustic Signal Enhancement (IWAENC)
    • Int'l Joint Research
  • [Presentation] Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments2022

    • Author(s)
      Du Yicheng、Nugraha Aditya Arie、Sekiguchi Kouhei、Bando Yoshiaki、Fontaine Mathieu、Yoshii Kazuyoshi
    • Organizer
      Annual Conference of the International Speech Communication Association (Interspeech)
    • Int'l Joint Research
  • [Presentation] Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments2022

    • Author(s)
      Sekiguchi Kouhei、Nugraha Aditya Arie、Du Yicheng、Bando Yoshiaki、Fontaine Mathieu、Yoshii Kazuyoshi
    • Organizer
      IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    • Int'l Joint Research
  • [Remarks] Demo web page for NF-FastMNMF

    • URL

      https://aanugraha.github.io/demo/nffastmnmf/

URL: 

Published: 2023-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi