2021 Fiscal Year Final Research Report

Construction of a computational model to deal with the cocktail-party problem for intelligent speech interface

Research Project

PDF

Project/Area Number	19K12035
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	National Institute of Information and Communications Technology
Principal Investigator	Lu Xugang 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (20362022)
Project Period (FY)	2019-04-01 – 2022-03-31
Keywords	知能情報
Outline of Final Research Achievements	In cocktail party scenarios, many information need to be explored in order to identify different speech (or sound) sources. Under this project, we have the following contributions: 1. For identifying speech source, who is speaking (speaker information) is one of the most important information. Besides developing speaker embedding system, we proposed a coupling of generative and discriminative learning for speaker recognition. Our framework showed a large improvement compared with state of the art models. 2. Concerning speech source recording environments may change in different domains, we proposed a new distance metric for unsupervised domain adaptation technique. We have tested the proposed adaptation algorithm on both speaker and language recognition tasks, and obtained promising improvement when speech recording environments are changed.
Free Research Field	人工知能、信号処理
Academic Significance and Societal Importance of the Research Achievements	カクテルパーティーのシナリオでは、混合音声ソースの場合、誰が話し、どの言語が使用されているかが、音声ソースの分離に関する重要な事前知識です。話者の認識性能と言語認識を改善するための新しいアイデアとアルゴリズムを開発しました。これは、音声ソースの事前知識の質を高めるのに役立ちます。