2021 Fiscal Year Final Research Report
Construction of a computational model to deal with the cocktail-party problem for intelligent speech interface
Project/Area Number |
19K12035
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
Lu Xugang 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (20362022)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Keywords | 知能情報 |
Outline of Final Research Achievements |
In cocktail party scenarios, many information need to be explored in order to identify different speech (or sound) sources. Under this project, we have the following contributions: 1. For identifying speech source, who is speaking (speaker information) is one of the most important information. Besides developing speaker embedding system, we proposed a coupling of generative and discriminative learning for speaker recognition. Our framework showed a large improvement compared with state of the art models. 2. Concerning speech source recording environments may change in different domains, we proposed a new distance metric for unsupervised domain adaptation technique. We have tested the proposed adaptation algorithm on both speaker and language recognition tasks, and obtained promising improvement when speech recording environments are changed.
|
Free Research Field |
人工知能、信号処理
|
Academic Significance and Societal Importance of the Research Achievements |
カクテルパーティーのシナリオでは、混合音声ソースの場合、誰が話し、どの言語が使用されているかが、音声ソースの分離に関する重要な事前知識です。話者の認識性能と言語認識を改善するための新しいアイデアとアルゴリズムを開発しました。これは、音声ソースの事前知識の質を高めるのに役立ちます。
|