Construction of a computational model to deal with the cocktail-party problem for intelligent speech interface
Project/Area Number |
19K12035
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
Lu Xugang 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (20362022)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Project Status |
Completed (Fiscal Year 2021)
|
Budget Amount *help |
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2021: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2020: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2019: ¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
|
Keywords | 知能情報 / Generative model / Discriminative model / Model coupling / Speaker embedidng / Unsupervised adaptation / Acoustic event detection / Speaker embedding / predictive coding / cocktail party |
Outline of Research at the Start |
We investigate predictive coding principle for computational model to dynamically parse the incoming mixed sound sources, and try to apply the model to the long-standing cocktail-party problem in ASR research field.
|
Outline of Final Research Achievements |
In cocktail party scenarios, many information need to be explored in order to identify different speech (or sound) sources. Under this project, we have the following contributions: 1. For identifying speech source, who is speaking (speaker information) is one of the most important information. Besides developing speaker embedding system, we proposed a coupling of generative and discriminative learning for speaker recognition. Our framework showed a large improvement compared with state of the art models. 2. Concerning speech source recording environments may change in different domains, we proposed a new distance metric for unsupervised domain adaptation technique. We have tested the proposed adaptation algorithm on both speaker and language recognition tasks, and obtained promising improvement when speech recording environments are changed.
|
Academic Significance and Societal Importance of the Research Achievements |
カクテルパーティーのシナリオでは、混合音声ソースの場合、誰が話し、どの言語が使用されているかが、音声ソースの分離に関する重要な事前知識です。話者の認識性能と言語認識を改善するための新しいアイデアとアルゴリズムを開発しました。これは、音声ソースの事前知識の質を高めるのに役立ちます。
|
Report
(4 results)
Research Products
(4 results)