• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2020 Fiscal Year Research-status Report

Construction of a computational model to deal with the cocktail-party problem for intelligent speech interface

Research Project

Project/Area Number 19K12035
Research InstitutionNational Institute of Information and Communications Technology

Principal Investigator

LU Xugang  国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 主任研究員 (20362022)

Project Period (FY) 2019-04-01 – 2022-03-31
KeywordsSpeaker embedidng / Unsupervised adaptation
Outline of Annual Research Achievements

In speech separation, one of the most important cues is the speaker information. In order to extract speaker information, we have constructed a speaker embedding system based on a large scale data corpus. Based on the embedding system, speaker characteristic for each input utterance could be estimated. This speaker embedding feature could be incorporated for mixed speech for speech (target speaker) extraction. Moreover, concerning speech may be from different recording environments, we proposed a new distance metric for unsupervised domain adaptation technique, and preliminary experiments on cross-channel domain spoken language recognition task showed promising results.

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

In last year, we found the importance of speaker characteristics in speech separation. We further focus on the techniques for speaker feature embedding. Based on a large and public speech data corpus for speaker recognition, we built a speaker embedding system. In the system, we proposed a generative and discriminative learning framework in order to explore discriminative and robust speaker information.

Strategy for Future Research Activity

Based on our previous investigations, we will further carry out studies in the following two directions: (1) Based on the speaker embeddings, we will study algorithms for target speaker speech tracking and separation, (2) since the speech recording channels may be different from session to session, we will investigate the model adaptation for cross-channel acoustic environments problem.

Causes of Carryover

Due to the COVID 19, the cost for business trip and workstations for data recordings were not used. In this new year plan, the workstation will be bought.

  • Research Products

    (1 results)

All 2020

All Presentation (1 results) (of which Int'l Joint Research: 1 results)

  • [Presentation] UNSUPERVISED NEURAL ADAPTATION MODEL BASED ON OPTIMAL TRANSPORT FOR SPOKEN LANGUAGE IDENTIFICATION2020

    • Author(s)
      Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai
    • Organizer
      ICASSP2021
    • Int'l Joint Research

URL: 

Published: 2021-12-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi