| 研究課題/領域番号 |
23K11227
|
| 研究種目 |
基盤研究(C)
|
| 配分区分 | 基金 |
| 応募区分 | 一般 |
| 審査区分 |
小区分61030:知能情報学関連
|
| 研究機関 | 東京科学大学 (2024) 国立研究開発法人情報通信研究機構 (2023) |
研究代表者 |
李 勝 東京科学大学, 工学院, 助教 (70840940)
|
| 研究分担者 |
李 吉屹 北海道大学, 情報科学研究院, 准教授 (30726667)
チョ シンキ 京都大学, 情報学研究科, 特定准教授 (70784891)
|
| 研究期間 (年度) |
2023-04-01 – 2026-03-31
|
| 研究課題ステータス |
交付 (2024年度)
|
| 配分額 *注記 |
4,810千円 (直接経費: 3,700千円、間接経費: 1,110千円)
2025年度: 1,300千円 (直接経費: 1,000千円、間接経費: 300千円)
2024年度: 1,300千円 (直接経費: 1,000千円、間接経費: 300千円)
2023年度: 2,210千円 (直接経費: 1,700千円、間接経費: 510千円)
|
| キーワード | speech recognition / large language model / multilingual / multimodal / multitask / low-resource / Multitask / Multimodal / Multilingual / Low-resource / quality estimation / federated learning |
| 研究開始時の研究の概要 |
Cross-modality, general purposed multitask model, and cross-lingual communication ability are three key features of next-generation artificial intelligence. This research focuses on advancing these three features simultaneously in the speech recognition (ASR) system to prove: (1) Can rich-resourced language information aid the understanding of low-resource languages? (2) Can other modal information aid the understanding of low-resource languages? (3) Can additional information from other tasks aid in understanding low-resource languages?
|
| 研究実績の概要 |
This research project aims to solve the classic low-resource problem of speech recognition area and search for solutions from natural language processing (NLP), multimodal modeling, and big data society. Our discoveries appeared/or were submitted not only to traditional speech conferences (ICASSP/interspeech) and TASLP journals but also to NLP top conferences (ACL). I also devoted myself to joining LLM-jp's finetuning LLM challenge and estimating Japanese students' English speaking ability using LLM.
|
| 現在までの達成度 (区分) |
現在までの達成度 (区分)
1: 当初の計画以上に進展している
理由
The world goes into the LLM era. This FY2024 year's research focus is low-resource conditioned LLM to improve speech recognition. For this research, we tried continual learning, in-contextual learning, and few-shot/zero-shot learning. 1. continual learning can solve the catastrophic forgetting problem, especially when frequent finetuning with new languages. We used that for low-resourced multilingual speech recognition, as reported in APSIPA ASC2024. In ICASSP2024, we extended this algorithm to different tasks, from speech recognition to emotion recognition. 2. for in-contextual learning/zero-shot learning, we use LLM to estimate Japanese student English ability in Cefr-J symposium2025. 3. For few-shot learning, we used the LoRA finetuned Llama-7B model to correct 20 language speech recognition results and achieved a leap forward in accuracy; we report it in Interspeech2024. 4. inspired by the multimodal area, we also introduced graph-based data structures to bridge the speech recognition system and LLM in APSIPA ASC2024.
|
| 今後の研究の推進方策 |
In FY2025, we will continue to work on LLM-based methods for multilingual, multimodal, and multitask methods. Recent progress of LLM will be incorporated into our research. We will also follow recent progress in embodied AI and social robotics.
|