2019 年度実施状況報告書

Construction of a computational model to deal with the cocktail-party problem for intelligent speech interface

研究課題

研究課題/領域番号	19K12035
研究機関	国立研究開発法人情報通信研究機構
研究代表者	LU Xugang 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 主任研究員 (20362022)
研究期間 (年度)	2019-04-01 – 2022-03-31
キーワード	Acoustic event detection / Speaker embedding
研究実績の概要	In order to construct a smart speech interface for real applications, we need to discriminate several sound sources which take different roles by conveying different information: Acoustic environments (different sound events and scenes), and speaker attributes (different genders, identities, and speaking segmentation, as well as different spoken languages). Correspondingly, we first constructed a deep learning system for acoustic event detection (figure out the acoustic sources), and then we built a speaker embedding system in order to characterize speakers' attributes. For more specific, we proposed a class-wise centroid distance metric based learning algorithm which showed improved performance in discriminating acoustic events. In addition, we constructed a speaker embedding system.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 As we carried out the research based on our original plan, we found more detailed and specific problems which need to be solved to step on the next step. In real application scenarios, the acoustic environments are rather complex (non-speech acoustic events, multi-speakers), rather than making a simple controlled acoustic environmental study for different speaker speech separation (original planned), we digged into deep in real acoustic environments with unexpected acoustic events and unknown speakers. Therefore, we investigated acoustic event and scene detection and speaker embedding techniques in order to utilize them for accurate source separation.
今後の研究の推進方策	Speaker attribute description is important for speech separation. For unknown speakers, we need to investigate a universal speaker feature description for separating different speakers. As our initial experiments showed that speaker embedding is one of the most efficient algorithms. In learning speaker embedding, what kind of loss metric is essential. In the following, we will focus on investigating efficient distance metric learning for discriminating speakers.
次年度使用額が生じた理由	In planned business trip for conferences (international conferences), due to COVID-19, I could not attend. Planned usage: Attending international conference INTERSPEECH 2020, Oct. Attending international conference SLT2020 (may be delayed to 2021. Jan.) Attending international conference Odyssey 2020 Nov.

研究成果
(1件)

すべて学会発表 (1件) (うち国際学会 1件)

[学会発表] Class-Wise Centroid Distance Metric Learning for Acoustic Event Detection2019
- 著者名/発表者名
  Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
- 学会等名
  Interspeech 2019
- 国際学会