Next generation multilingual End-to-End speech recognition (from G30 to G200)
Project/Area Number |
19K24376
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Multi-year Fund |
Review Section |
1002:Human informatics, applied informatics and related fields
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
Li Sheng 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター 先進的音声技術研究室, 研究員 (70840940)
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Project Status |
Completed (Fiscal Year 2020)
|
Budget Amount *help |
¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | speech recognition / multilingual / articulation / End-to-End / multilingual modeling / low-resourced modeling / speech translation / multi-unit modeling / language identification / disordered speech / code-switched / end-to-end / speaker diarization |
Outline of Research at the Start |
This project will focus on tackling the problems of the low-resource language (e.g., ASEAN languages) and modeling languages as many as we can (hundreds of languages from all language families) in a single model under current state-of-the-art End-to-End automatic speech recognition (ASR) framework.
|
Outline of Final Research Achievements |
As the most natural way of communication, voice interface with the support of automatic speech recognition (ASR) technology has become crucial in human-computer interaction (HCI) in various devices of today's high-digitized society. Most commercial ASR-enabled products focus on specific popular languages such as English, French, Chinese, Japanese. The speech recognition of less popular languages, such as the ASEAN languages, is still a topic worthy of continued research. Global internationalization raises many real-life situations of multilingual communication, such as regional events, cultural exchanges, festivals. The proposed project focused on tackling the problems of the low-resource data and modeling many languages in a single model under the current state-of-the-art End-to-End modeling framework. We also made an in-depth investigation of these problems.
|
Academic Significance and Societal Importance of the Research Achievements |
This research shows we can integrate linguistic knowledge into the neural network instead of adding more layers or enlarging the model size. The proposed method is universally available for broad tasks for Society 5.0 (such as multilingual speech recognition, disordered speech recognition).
|
Report
(3 results)
Research Products
(40 results)