M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition

Research Project

Project/Area Number	23K11227
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	National Institute of Information and Communications Technology
Principal Investigator	李勝国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)
Co-Investigator(Kenkyū-buntansha)	李吉屹山梨大学, 大学院総合研究部, 准教授 (30726667) チョシンキ京都大学, 情報学研究科, 特定准教授 (70784891)
Project Period (FY)	2023-04-01 – 2026-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000) Fiscal Year 2025: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2024: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2023: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
Keywords	speech recognition / Multitask / Multimodal / Multilingual / Low-resource / quality estimation / federated learning
Outline of Research at the Start	Cross-modality, general purposed multitask model, and cross-lingual communication ability are three key features of next-generation artificial intelligence. This research focuses on advancing these three features simultaneously in the speech recognition (ASR) system to prove: (1) Can rich-resourced language information aid the understanding of low-resource languages? (2) Can other modal information aid the understanding of low-resource languages? (3) Can additional information from other tasks aid in understanding low-resource languages?
Outline of Annual Research Achievements	This research project aims to solve the classic low-resource problem of speech recognition area and search for solutions from natural language processing (NLP), multimodal modeling and big data society. Research achievements of FY2023 were fruitful. Our publications appeared not only in traditional speech conferences (ICASSP/ASRU) and journals (speech communication) but also in NLP top conferences (ACL, IWSLT), big data conferences (DASFAA), neural network conference (ICANN), and multimedia conferences (ACM Multimedia Asia). The achievements were also reported in domestic conferences of both speech and NLP. I also devoted myself into challenges of speech recognition and quality estimation of speech synthesis, both got top ranking scores.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason To solve the low-resourced problems of speech recognition, we proposed the following methods regarding multimodal, multilingual, and multitasking: 1. For the multilingual problem, we proposed universal language modeling technology. In FY2023, an enhanced hierarchical softmax modeling method was used to encode hundreds of languages, and we reported on it in the ASJ2023 autumn. We also hold a workshop to promote data collection and sharing for low-resourced languages. 2. For multimodal modeling, we introduced multimodal modeling technology into speech processing, such as model reprogramming technology. 3. The pretrained speech and language models were used together within my proposed multitasking downstreaming framework. I successfully combined the wav2vec2.0 model with the GPT and BERT models for dialectical speech recognition. Moreover, I proposed combining the current state-of-the-art speech recognition model, OpenAI whisper, with a large language model, Meta Llama2.0.
Strategy for Future Research Activity	In FY2023, the large language model attracted significant attention from both industry and academia. In my research, I also empirically proved that it can revolutionize performance in most speech tasks. So, in FY2024, I will integrate the large language model into our speech recognition task. And in the meantime, I will keep an eye on multimodal modeling technology.

Report

(1 results)

2023 Research-status Report

Research Products
(26 results)

All 2024 2023 Other

All Int'l Joint Research (1 results) Journal Article (4 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 4 results, Open Access: 1 results) Presentation (16 results) (of which Int'l Joint Research: 13 results, Invited: 1 results) Remarks (4 results) Funded Workshop (1 results)

[Int'l Joint Research] Nanyang Technological University(シンガポール)
- Related Report
  2023 Research-status Report
[Journal Article] Phantom in the opera: adversarial music attack for robot dialogue system2024
- Author(s)
  Li Sheng、Li Jiyi、Cao Yang
- Journal Title
  
  Frontiers in Computer Science, 15 February 2024
  
  Volume: 6 Pages: 1-9
- DOI
  10.3389/fcomp.2024.1355975
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network2024
- Author(s)
  Li Nan、Wang Longbiao、Ge Meng、Unoki Masashi、Li Sheng、Dang Jianwu
- Journal Title
  
  Speech Communication
  
  Volume: 157 Pages: 103024-103024
- DOI
  10.1016/j.specom.2023.103024
- Related Report
  2023 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Disordered speech recognition considering low resources and abnormal articulation2023
- Author(s)
  Lin Yuqin、Dang Jianwu、Wang Longbiao、Li Sheng、Ding Chenchen
- Journal Title
  
  Speech Communication
  
  Volume: 155 Pages: 103002-103002
- DOI
  10.1016/j.specom.2023.103002
- Related Report
  2023 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings2023
- Author(s)
  Soky Kak、Li Sheng、Chu Chenhui、Kawahara Tatsuya
- Journal Title
  
  International Journal of Asian Language Processing
  
  Volume: 33 Issue: 04 Pages: 2350024-2350024
- DOI
  10.1142/s2717554523500248
- Related Report
  2023 Research-status Report
- Peer Reviewed
[Presentation] Investigating effective methods for combining large language model with speech recognition system2024
- Author(s)
  李勝, 楊正東, 周汪勁, Chenhui Chu, 河井恒
- Organizer
  日本音響学会第151回(2024年春季)研究発表会
- Related Report
  2023 Research-status Report
[Presentation] Combining Large Language Model with Speech Recognition System in Low-resource Settings2024
- Author(s)
  李勝, 楊正東, 周汪勁, Chenhui Chu, Chen Chen, Chng Eng Siong, 河井恒
- Organizer
  言語処理学会第30回年次大会
- Related Report
  2023 Research-status Report
[Presentation] Cross-lingual Mapping for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition2024
- Author(s)
  Zhengdong Yang, Qianying Liu, Sheng Li, Chenhui Chu, Fei Cheng, Sadao Kurohashi
- Organizer
  日本音響学会第 150 回(2023 年秋季)研究発表会
- Related Report
  2023 Research-status Report
[Presentation] MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction2024
- Author(s)
  Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara
- Organizer
  IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement2023
- Author(s)
  Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu
- Organizer
  IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection2023
- Author(s)
  Wenqing Wei, Zhengdong Yang, Yuan Gao, Jiyi Li, Chenhui Chu, Shogo Okada, Sheng Li
- Organizer
  IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2023
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis2023
- Author(s)
  Wangjin Zhou, Zhengdong Yang, Sheng Li, Chenhui Chu
- Organizer
  ACM Multimedia Asia Workshops 2023
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System2023
- Author(s)
  Xiaojiao Chen, Sheng Li, Jiyi Li, Yang Cao, Hao Huang, Liang He
- Organizer
  ACM Multimedia Asia
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization2023
- Author(s)
  Xiaojiao Chen, Sheng Li, Jiyi Li, Hao Huang, Yang Cao, Liang He
- Organizer
  ACM Multimedia Asia
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition2023
- Author(s)
  Sheng Li, Jiyi Li
- Organizer
  Artificial Neural Networks and Machine Learning (ICANN) 2023
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] The Kyoto Speech-to-Speech Translation System for IWSLT 20232023
- Author(s)
  Zhengdong Yang, Shuichiro Shimizu, Wangjin Zhou, Sheng Li, Chenhui Chu
- Organizer
  International Conference on Spoken Language Translation (IWSLT) 2023
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Dialogue State Tracking with Sparse Local Slot Attention2023
- Author(s)
  Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
- Organizer
  ACL 2023 Workshop on NLP for Conversational AI
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention2023
- Author(s)
  Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
- Organizer
  In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Towards Speech Dialogue Translation Mediating Speakers of Different Languages2023
- Author(s)
  Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi
- Organizer
  In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023): Findings
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Hierarchical Softmax for End-To-End Low-Resource Multilingual Speech Recognition2023
- Author(s)
  Qianying Liu Zhuo Gong Zhengdong Yang Yuhang Yang Sheng Li Chenchen Ding Nobuaki Minematsu Hao Huang Fei Cheng Chenhui Chu Sadao Kurohashi
- Organizer
  2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Self-Supervised Learning MOS Prediction with Listener Enhancement2023
- Author(s)
  Sheng Li
- Organizer
  VoiceMOS mini workshop
- Related Report
  2023 Research-status Report
- Int'l Joint Research / Invited
[Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。
- URL
  https://oca-pub.nict.go.jp/
- Related Report
  2023 Research-status Report
[Remarks] google scholar of Sheng Li
- URL
  https://scholar.google.com/citations?user=zHAhs0IAAAAJ&hl=en
- Related Report
  2023 Research-status Report
[Remarks] Lab homepage of Sheng Li
- URL
  https://ast-astrec.nict.go.jp/member/sheng-li/index.html
- Related Report
  2023 Research-status Report
[Remarks] reseachmap homepage of Sheng Li
- URL
  https://researchmap.jp/listen
- Related Report
  2023 Research-status Report
[Funded Workshop] ACM Multimedia Asia 2023 workshop: Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages (M3Oriental)2023
- Related Report
  2023 Research-status Report

M3OLR: Towards Effective Multilingual, Multimodal and Multitask Oriental Low-resourced Language Speech Recognition

Principal Investigator

李 勝 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)

¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)

Current Status of Research Progress

Reason

Report

Research Products

[Int'l Joint Research] Nanyang Technological University(シンガポール)

Related Report

[Journal Article] Phantom in the opera: adversarial music attack for robot dialogue system2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Robust voice activity detection using an auditory-inspired masked modulation encoder based convolutional attention network2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Disordered speech recognition considering low resources and abnormal articulation2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Finetuning Pretrained Model with Embedding of Domain and Language Information for ASR of Very Low-Resource Settings2023

Author(s)

Journal Title

DOI

Related Report

[Presentation] Investigating effective methods for combining large language model with speech recognition system2024

Author(s)

Organizer

Related Report

[Presentation] Combining Large Language Model with Speech Recognition System in Low-resource Settings2024

Author(s)

Organizer

Related Report

[Presentation] Cross-lingual Mapping for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition2024

Author(s)

Organizer

Related Report

[Presentation] MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction2024

Author(s)

Organizer

Related Report

[Presentation] LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement2023

Author(s)

Organizer

Related Report

[Presentation] FedCPC: An Effective Federated Contrastive Learning Method for Privacy Preserving Early-Stage Alzheimers Speech Detection2023

Author(s)

Organizer

Related Report

[Presentation] KyotoMOS: An Automatic MOS Scoring System for Speech Synthesis2023

Author(s)

Organizer

Related Report

[Presentation] GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System2023

Author(s)

Organizer

Related Report

[Presentation] Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization2023

Author(s)

Organizer

Related Report

[Presentation] Correction while Recognition: Combining Pretrained Language Model for Taiwan-Accented Speech Recognition2023

Author(s)

Organizer

Related Report

[Presentation] The Kyoto Speech-to-Speech Translation System for IWSLT 20232023

Author(s)

Organizer

Related Report

[Presentation] Dialogue State Tracking with Sparse Local Slot Attention2023

Author(s)

Organizer

Related Report

[Presentation] Multi-Domain Dialogue State Tracking with Disentangled Domain-Slot Attention2023

Author(s)

李勝国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)