2023 年度実績報告書

Self-supervised graph-based representation for language and speaker detection

研究課題

研究課題/領域番号	21K17776
研究機関	国立研究開発法人情報通信研究機構
研究代表者	沈鵬国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)
研究期間 (年度)	2021-04-01 – 2024-03-31
キーワード	language identification / Speech recognition / pre-training model / large language models / speaker diarization
研究実績の概要	In year 2023, I focused on investigating how to better use pre-trained or self-supervised training models to improve the performance of language recognition(LID) and speech recognition (ASR) tasks. In detail, the following work was done to progress this project: 1. With the success of ChatGPT, I began to investigate the generative model and tried to use the knowledge from this model to improve the performance of LID. Such investigations are important to understand the behavior of large language models. Our work was published by IEEE ASRU 2023. 2. I also focused on improving cross-domain ASR tasks. We tried to use pre-trained large models, such as BERT, and proposed using optimal transport techniques to better utilize the knowledge transferred from the large models. Our works were published by IEEE ICASSP 2022, 2024, and IEEE ASRU 2023. 3. I also propose a novel speaker mask branch to detection the speech segments of individual speakers. With the proposed model, we can perform both ASR and speaker diarization tasks simultaneously using a single model. In this project, I conducted investigations and utilized models trained with self-supervised techniques or pre-trained techniques for LID, speaker recognition, and ASR tasks. Through this research, we classified how to better utilize the knowledge inside the pre-trained models and proposed several techniques, such as RNN-T-based LID and optimal transport-based ASR to improve the performance of these tasks. Especially, our proposed techniques was successfully used to build the NICT LID system, which showed very robust performance.

研究成果
(4件)

すべて 2024 2023

すべて学会発表 (4件) (うち国際学会 3件)

[学会発表] Hierarchical cross-modality knowledge transfer with Sinkhorn attention for CTC-based ASR2024
- 著者名/発表者名
  X. Lu, P. Shen, Y. Tsao, H. Kawai
- 学会等名
  IEEE ICASSP
- 国際学会
[学会発表] Generative linguistic representation for spoken language identification2023
- 著者名/発表者名
  P. Shen, X. Lu, H. Kawai
- 学会等名
  IEEE ASRU
- 国際学会
[学会発表] Cross-modal alignment with optimal transport for CTC-based ASR2023
- 著者名/発表者名
  X. Lu, P. Shen, Y. Tsao, H. Kawai
- 学会等名
  IEEE ASRU
- 国際学会
[学会発表] Investigation on Multi-task Universal Speech Models2023
- 著者名/発表者名
  P. Shen, X. Lu, H. Kawai
- 学会等名
  Autumn Meeting of Acoustical Society of Japan

2023 年度 実績報告書

Self-supervised graph-based representation for language and speaker detection

研究代表者

沈 鵬 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)

研究成果

[学会発表] Hierarchical cross-modality knowledge transfer with Sinkhorn attention for CTC-based ASR2024

著者名/発表者名

学会等名

[学会発表] Generative linguistic representation for spoken language identification2023

著者名/発表者名

学会等名

[学会発表] Cross-modal alignment with optimal transport for CTC-based ASR2023

著者名/発表者名

学会等名

[学会発表] Investigation on Multi-task Universal Speech Models2023

著者名/発表者名

学会等名

2023 年度実績報告書

沈鵬国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)