研究課題/領域番号 |
19K12035
|
研究機関 | 国立研究開発法人情報通信研究機構 |
研究代表者 |
LU Xugang 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 主任研究員 (20362022)
|
研究期間 (年度) |
2019-04-01 – 2022-03-31
|
キーワード | Acoustic event detection / Speaker embedding |
研究実績の概要 |
In order to construct a smart speech interface for real applications, we need to discriminate several sound sources which take different roles by conveying different information: Acoustic environments (different sound events and scenes), and speaker attributes (different genders, identities, and speaking segmentation, as well as different spoken languages). Correspondingly, we first constructed a deep learning system for acoustic event detection (figure out the acoustic sources), and then we built a speaker embedding system in order to characterize speakers' attributes. For more specific, we proposed a class-wise centroid distance metric based learning algorithm which showed improved performance in discriminating acoustic events. In addition, we constructed a speaker embedding system.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
As we carried out the research based on our original plan, we found more detailed and specific problems which need to be solved to step on the next step. In real application scenarios, the acoustic environments are rather complex (non-speech acoustic events, multi-speakers), rather than making a simple controlled acoustic environmental study for different speaker speech separation (original planned), we digged into deep in real acoustic environments with unexpected acoustic events and unknown speakers. Therefore, we investigated acoustic event and scene detection and speaker embedding techniques in order to utilize them for accurate source separation.
|
今後の研究の推進方策 |
Speaker attribute description is important for speech separation. For unknown speakers, we need to investigate a universal speaker feature description for separating different speakers. As our initial experiments showed that speaker embedding is one of the most efficient algorithms. In learning speaker embedding, what kind of loss metric is essential. In the following, we will focus on investigating efficient distance metric learning for discriminating speakers.
|
次年度使用額が生じた理由 |
In planned business trip for conferences (international conferences), due to COVID-19, I could not attend. Planned usage: Attending international conference INTERSPEECH 2020, Oct. Attending international conference SLT2020 (may be delayed to 2021. Jan.) Attending international conference Odyssey 2020 Nov.
|