• 研究課題をさがす
  • 研究者をさがす
  • KAKENの使い方
  1. 課題ページに戻る

2023 年度 実績報告書

Self-supervised graph-based representation for language and speaker detection

研究課題

研究課題/領域番号 21K17776
研究機関国立研究開発法人情報通信研究機構

研究代表者

沈 鵬  国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)

研究期間 (年度) 2021-04-01 – 2024-03-31
キーワードlanguage identification / Speech recognition / pre-training model / large language models / speaker diarization
研究実績の概要

In year 2023, I focused on investigating how to better use pre-trained or self-supervised training models to improve the performance of language recognition(LID) and speech recognition (ASR) tasks. In detail, the following work was done to progress this project:
1. With the success of ChatGPT, I began to investigate the generative model and tried to use the knowledge from this model to improve the performance of LID. Such investigations are important to understand the behavior of large language models. Our work was published by IEEE ASRU 2023.
2. I also focused on improving cross-domain ASR tasks. We tried to use pre-trained large models, such as BERT, and proposed using optimal transport techniques to better utilize the knowledge transferred from the large models. Our works were published by IEEE ICASSP 2022, 2024, and IEEE ASRU 2023.
3. I also propose a novel speaker mask branch to detection the speech segments of individual speakers. With the proposed model, we can perform both ASR and speaker diarization tasks simultaneously using a single model.

In this project, I conducted investigations and utilized models trained with self-supervised techniques or pre-trained techniques for LID, speaker recognition, and ASR tasks. Through this research, we classified how to better utilize the knowledge inside the pre-trained models and proposed several techniques, such as RNN-T-based LID and optimal transport-based ASR to improve the performance of these tasks. Especially, our proposed techniques was successfully used to build the NICT LID system, which showed very robust performance.

  • 研究成果

    (4件)

すべて 2024 2023

すべて 学会発表 (4件) (うち国際学会 3件)

  • [学会発表] Hierarchical cross-modality knowledge transfer with Sinkhorn attention for CTC-based ASR2024

    • 著者名/発表者名
      X. Lu, P. Shen, Y. Tsao, H. Kawai
    • 学会等名
      IEEE ICASSP
    • 国際学会
  • [学会発表] Generative linguistic representation for spoken language identification2023

    • 著者名/発表者名
      P. Shen, X. Lu, H. Kawai
    • 学会等名
      IEEE ASRU
    • 国際学会
  • [学会発表] Cross-modal alignment with optimal transport for CTC-based ASR2023

    • 著者名/発表者名
      X. Lu, P. Shen, Y. Tsao, H. Kawai
    • 学会等名
      IEEE ASRU
    • 国際学会
  • [学会発表] Investigation on Multi-task Universal Speech Models2023

    • 著者名/発表者名
      P. Shen, X. Lu, H. Kawai
    • 学会等名
      Autumn Meeting of Acoustical Society of Japan

URL: 

公開日: 2024-12-25  

サービス概要 検索マニュアル よくある質問 お知らせ 利用規程 科研費による研究の帰属

Powered by NII kakenhi