• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

A Unified Computational Model for Audio-Visual Recognition of Human Social Interaction

Research Project

Project/Area Number 20K19833
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

NUGRAHA Aditya Arie  国立研究開発法人理化学研究所, 革新知能統合研究センター, 研究員 (60858025)

Project Period (FY) 2020-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2022: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2021: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2020: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
KeywordsAudio-visual processing / Smart glasses / Adaptive system / Blind source separation / Speech enhancement / Speech recognition / Neural spatial model / Generative model / Normalizing flow / Dereverberation / Deep spatial model / deep speech model / deep generative model / latent variable model / variational autoencoder / normalizing flow / audio-visual processing / probabilistic model / speech enhancement / speaker diarization
Outline of Research at the Start

We aim to form a unified computational model of audio-visual scene understanding that mimics human’s capability in exploiting audio and visual cues. We expect the model can improve front-end processes (e.g., speech enhancement) and back-end processes (e.g., speech recognition) in a mutual manner.

Outline of Final Research Achievements

We aimed for a probabilistic computational model of audio-visual information processing for understanding human verbal communication. We proposed a model for generating speech signals from speaker labels controlling the voice characteristics and phone labels controlling the speech content. For speech enhancement, it potentially improves not only the signal quality but also the speech intelligibility. We also introduced principled time-varying extensions, based on a novel deep generative model called normalizing flow, of time-invariant blind source separation (BSS) methods, including the classical independent vector analysis and the state-of-the-art FastMNMF. Finally, we developed adaptive audio-visual speech enhancement with augmented reality smart glasses. Camera images allow speakers of interest to be identified to control direction-aware enhancement. We achieve robust low-latency enhancement via a fast environment-sensitive beamforming governed by a slow environment-agnostic BSS.

Academic Significance and Societal Importance of the Research Achievements

One key achievement is the prototype of adaptive speech enhancement for real-time speech transcription with head-worn smart glasses. It involves challenging egocentric information processing with non-stationary sensors. This technology may benefit older adults and people with hearing impairment.

Report

(4 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Research-status Report
  • 2020 Research-status Report
  • Research Products

    (24 results)

All 2022 2021 2020 Other

All Journal Article (6 results) (of which Int'l Joint Research: 6 results,  Peer Reviewed: 6 results,  Open Access: 4 results) Presentation (14 results) (of which Int'l Joint Research: 11 results) Remarks (4 results)

  • [Journal Article] Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation2022

    • Author(s)
      Fontaine Mathieu, Sekiguchi Kouhei, Nugraha Aditya Arie, Bando Yoshiaki, Yoshii Kazuyoshi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 30 Pages: 1734-1748

    • DOI

      10.1109/taslp.2022.3172631

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Autoregressive Moving Average Jointly-Diagonalizable Spatial Covariance Analysis for Joint Source Separation and Dereverberation2022

    • Author(s)
      Sekiguchi Kouhei, Bando Yoshiaki, Nugraha Aditya Arie, Fontaine Mathieu, Yoshii Kazuyoshi, Kawahara Tatsuya
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 30 Pages: 2368-2382

    • DOI

      10.1109/taslp.2022.3190734

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Neural Full-Rank Spatial Covariance Analysis for Blind Source Separation2021

    • Author(s)
      Yoshiaki Bando, Kouhei Sekiguchi, Yoshiki Masuyama, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
    • Journal Title

      IEEE Signal Processing Letters

      Volume: 28 Pages: 1670-1674

    • DOI

      10.1109/lsp.2021.3101699

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] A Flow-Based Deep Latent Variable Model for Speech Spectrogram Modeling and Enhancement2020

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Yoshii Kazuyoshi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 1104-1117

    • DOI

      10.1109/taslp.2020.2979603

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020

    • Author(s)
      Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 2610-2625

    • DOI

      10.1109/taslp.2020.3019181

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Flow-Based Independent Vector Analysis for Blind Source Separation2020

    • Author(s)
      Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii
    • Journal Title

      IEEE Signal Processing Letters

      Volume: 27 Pages: 2173-2177

    • DOI

      10.1109/lsp.2020.3039944

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation2022

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization2022

    • Author(s)
      Fontaine Mathieu、Di Carlo Diego、Sekiguchi Kouhei、Nugraha Aditya Arie、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      European Signal Processing Conference (EUSIPCO)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Joint Localization and Synchronization of Distributed Camera-Attached Microphone Arrays for Indoor Scene Analysis2022

    • Author(s)
      Sumura Yoshiaki、Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Yoshii Kazuyoshi
    • Organizer
      International Workshop on Acoustic Signal Enhancement (IWAENC)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF2022

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      International Workshop on Acoustic Signal Enhancement (IWAENC)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments2022

    • Author(s)
      Du Yicheng、Nugraha Aditya Arie、Sekiguchi Kouhei、Bando Yoshiaki、Fontaine Mathieu、Yoshii Kazuyoshi
    • Organizer
      Annual Conference of the International Speech Communication Association (Interspeech)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments2022

    • Author(s)
      Sekiguchi Kouhei、Nugraha Aditya Arie、Du Yicheng、Bando Yoshiaki、Fontaine Mathieu、Yoshii Kazuyoshi
    • Organizer
      IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Alpha-Stable Autoregressive Fast Multichannel Nonnegative Matrix Factorization for Joint Speech Enhancement and Dereverberation2021

    • Author(s)
      Fontaine Mathieu、Sekiguchi Kouhei、Nugraha Aditya Arie、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      INTERSPEECH
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Autoregressive Fast Multichannel Nonnegative Matrix Factorization For Joint Blind Source Separation And Dereverberation2021

    • Author(s)
      Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Fontaine Mathieu、Yoshii Kazuyoshi
    • Organizer
      ICASSP
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Determined Blind Source Separation Based on NF-IVA with Time-Varying Linear Transformations2021

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      ASJ (Spring Meeting)
    • Related Report
      2021 Research-status Report
  • [Presentation] Joint Blind Source Separation and Dereverberation Based on ARMA-FastMNMF2021

    • Author(s)
      Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Fontaine Mathieu、Yoshii Kazuyoshi
    • Organizer
      ASJ (Spring Meeting)
    • Related Report
      2021 Research-status Report
  • [Presentation] Unsupervised Source Separation with Deep Spatial Models2021

    • Author(s)
      Nugraha Aditya Arie、Sekiguchi Kouhei、Fontaine Mathieu、Bando Yoshiaki、Yoshii Kazuyoshi
    • Organizer
      RIKEN-AIP Open Seminar
    • Related Report
      2021 Research-status Report
  • [Presentation] Unsupervised Robust Speech Enhancement Based on Alpha-Stable Fast Multichannel Nonnegative Matrix Factorization2020

    • Author(s)
      Fontaine Mathieu、Sekiguchi Kouhei、Nugraha Aditya Arie、Yoshii Kazuyoshi
    • Organizer
      INTERSPEECH
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Fast Multichannel Correlated Tensor Factorization for Blind Source Separation2020

    • Author(s)
      Yoshii Kazuyoshi、Sekiguchi Kouhei、Bando Yoshiaki、Fontaine Mathieu、Nugraha Aditya Arie
    • Organizer
      EUSIPCO
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Semi-supervised Multichannel Speech Separation Based on a Phone- and Speaker-Aware Deep Generative Model of Speech Spectrograms2020

    • Author(s)
      Du Yicheng、Sekiguchi Kouhei、Bando Yoshiaki、Nugraha Aditya Arie、Fontaine Mathieu、Yoshii Kazuyoshi、Kawahara Tatsuya
    • Organizer
      EUSIPCO
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Remarks] Demo web page for NF-FastMNMF

    • URL

      https://aanugraha.github.io/demo/nffastmnmf/

    • Related Report
      2022 Annual Research Report
  • [Remarks] Demo web page for Neural FCA

    • URL

      https://ybando.jp/projects/spl2021/

    • Related Report
      2021 Research-status Report
  • [Remarks] Demo web page for NF-IVA

    • URL

      https://aanugraha.github.io/demo/nfiva/

    • Related Report
      2021 Research-status Report
  • [Remarks] Demo web page for GF-VAE

    • URL

      https://aanugraha.github.io/demo/gfvae/

    • Related Report
      2021 Research-status Report

URL: 

Published: 2020-04-28   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi