2022 Fiscal Year Research-status Report
Self-supervised graph-based representation for language and speaker detection
Project/Area Number |
21K17776
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
沈 鵬 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Keywords | language identification / Speech recognition / cross-domain / pre-training model / self-supervised learning |
Outline of Annual Research Achievements |
I focused on investigating how to better represent speech signals for both language recognition and speech recognition tasks. In detail, the following work was done to progress this project: 1. Improving the representation of speech signal for language identification (LID): We propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. The research paper was accepted by Interspeech 2022. Additionally, we further investigated these techniques on the NICT LID system, which also demonstrated robustness on cross-channel data. 2. Another work focuses on improving RNN-T for Mandarin ASR. I propose to use a novel pronunciation-aware unique character encoding for building end-to-end RNN-T-based Mandarin ASR systems. The proposed encoding is a combination of pronunciation-based syllable and character index (CI). By introducing the CI, the RNN-T model can overcome the homophone problem while utilizing the pronunciation information for extracting modeling units. With the proposed encoding, the model outputs can be converted into the final recognition result through a one-to-one mapping. This paper was accepted by IEEE SLT 2022.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Following the plan, I conducted work to investigate and understand how to better represent languages for language identification and model units for ASR tasks. The related work progressed smoothly, and the research results were published at top-level international conferences.
|
Strategy for Future Research Activity |
I will focus on investigating how to build a universal model for speaker, language, and speech recognition tasks with a single model. I will concentrate on the following problems: 1. Investigating large universal models, such as Whisper, and attempting to train or fine-tune similar larger models. 2. Building a joint task training framework based on a pre-trained large speech representation model.
|
Causes of Carryover |
Due to the COVID-19 pandemic, relevant budgets for travel expenses were not utilized. This year, these budgets will be allocated to purchasing machines and devices for data preparation and building demo systems.
|
Research Products
(2 results)