2021 Fiscal Year Research-status Report
Self-supervised graph-based representation for language and speaker detection
Project/Area Number |
21K17776
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
沈 鵬 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Keywords | speaker recognition / language identification / cross-domain / self-supervised learning / pre-training model |
Outline of Annual Research Achievements |
I focused on investigating how to better represent speech signals for both speaker and language recognition tasks. In detail, the following work was done to progress this project. 1.Utilizing generative and discriminative model for speaker verification: We proposed a hybrid learning framework, i.e., coupling a joint Bayesian generative model structure and parameters with a neural discriminative learning framework to improve the recognition performance (The related results were published in the IEEE/ACM TASLP(journal) and APASIPA(international conference)). 2.Improving the representation of speech signal for language identification(LID): We propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer’s linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. To reduce the influence of the cross-domain problem, we proposed a joint distribution alignment model based on partial optimal transport algorithm (Two papers were submitted to International conference).
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Following the plan, I conducted research work to investigate and understand how to better represent languages and speakers of a speech signal. The related work progressed smoothly. And the related research results were published or submitted to the top-level journal and international conferences.
|
Strategy for Future Research Activity |
I will further investigate how to better utilize the structural phonetic information to represent a speech signal for speaker, language, and speech recognition tasks. I will focus on the following problems: 1. How to better utilize both the acoustic feature and linguistic features is still a challenging task in language recognition. Especially with a self-supervised or pre-trained learning manner. 2. Recently, tokenization methods are widely used in natural language processing tasks, I will also focus on how to use tokens to build a universal model to represent multilingual speech signals.
|
Causes of Carryover |
Affected by the covid19 epidemic, budgets for running machines and travel were not implemented FY2021. FY2022, these budgets will be used to purchase high-performance computing machines for training large-scale pre-trained models.
|
Research Products
(2 results)