2023 Fiscal Year Annual Research Report
Self-supervised graph-based representation for language and speaker detection
Project/Area Number |
21K17776
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
沈 鵬 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Keywords | language identification / Speech recognition / pre-training model / large language models / speaker diarization |
Outline of Annual Research Achievements |
In year 2023, I focused on investigating how to better use pre-trained or self-supervised training models to improve the performance of language recognition(LID) and speech recognition (ASR) tasks. In detail, the following work was done to progress this project: 1. With the success of ChatGPT, I began to investigate the generative model and tried to use the knowledge from this model to improve the performance of LID. Such investigations are important to understand the behavior of large language models. Our work was published by IEEE ASRU 2023. 2. I also focused on improving cross-domain ASR tasks. We tried to use pre-trained large models, such as BERT, and proposed using optimal transport techniques to better utilize the knowledge transferred from the large models. Our works were published by IEEE ICASSP 2022, 2024, and IEEE ASRU 2023. 3. I also propose a novel speaker mask branch to detection the speech segments of individual speakers. With the proposed model, we can perform both ASR and speaker diarization tasks simultaneously using a single model.
In this project, I conducted investigations and utilized models trained with self-supervised techniques or pre-trained techniques for LID, speaker recognition, and ASR tasks. Through this research, we classified how to better utilize the knowledge inside the pre-trained models and proposed several techniques, such as RNN-T-based LID and optimal transport-based ASR to improve the performance of these tasks. Especially, our proposed techniques was successfully used to build the NICT LID system, which showed very robust performance.
|