• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2022 Fiscal Year Final Research Report

A Unified Computational Model for Audio-Visual Recognition of Human Social Interaction

Research Project

  • PDF
Project/Area Number 20K19833
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

NUGRAHA Aditya Arie  国立研究開発法人理化学研究所, 革新知能統合研究センター, 研究員 (60858025)

Project Period (FY) 2020-04-01 – 2023-03-31
KeywordsAudio-visual processing / Smart glasses / Adaptive system / Blind source separation / Speech enhancement / Speech recognition / Neural spatial model / Generative model
Outline of Final Research Achievements

We aimed for a probabilistic computational model of audio-visual information processing for understanding human verbal communication. We proposed a model for generating speech signals from speaker labels controlling the voice characteristics and phone labels controlling the speech content. For speech enhancement, it potentially improves not only the signal quality but also the speech intelligibility. We also introduced principled time-varying extensions, based on a novel deep generative model called normalizing flow, of time-invariant blind source separation (BSS) methods, including the classical independent vector analysis and the state-of-the-art FastMNMF. Finally, we developed adaptive audio-visual speech enhancement with augmented reality smart glasses. Camera images allow speakers of interest to be identified to control direction-aware enhancement. We achieve robust low-latency enhancement via a fast environment-sensitive beamforming governed by a slow environment-agnostic BSS.

Free Research Field

Audio-visual speech enhancement for smart glasses

Academic Significance and Societal Importance of the Research Achievements

One key achievement is the prototype of adaptive speech enhancement for real-time speech transcription with head-worn smart glasses. It involves challenging egocentric information processing with non-stationary sensors. This technology may benefit older adults and people with hearing impairment.

URL: 

Published: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi