• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Final Research Report

Self-supervised graph-based representation for language and speaker detection

Research Project

  • PDF
Project/Area Number 21K17776
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionNational Institute of Information and Communications Technology

Principal Investigator

Shen Peng  国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (80773118)

Project Period (FY) 2021-04-01 – 2024-03-31
Keywordslanguage identification / Speech recognition / self-supervised learning / speaker recognition
Outline of Final Research Achievements

In this project, we focus on developing self-supervised or pre-trained techniques to enhance spoken language and speaker recognition tasks. We experimented with different methods to better capture the characteristics of languages and speakers from speech signals. Our proposed techniques include transducer-based language embeddings, pronunciation-aware character encoding, cross-modal alignment, and generative linguistic representations. These innovations aim to improve language and speaker recognition, as well as speech recognition tasks. Further, we explored multi-task recognition to advance language, speaker, and speech recognition using a single model. The results of this project have been published at top international conferences, including IEEE ICASSP, SLT, ASRU, and Interspeech.

Free Research Field

知覚情報処理関連

Academic Significance and Societal Importance of the Research Achievements

本プロジェクトは、音声信号の理解と表現を進化させることをその大きな目的としており、このことは重要な科学的意義を有する。言語と話者の認識におけるパフォーマンス向上のための技術は、技術的な応用を進めることに役立つ。

URL: 

Published: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi