2020 Fiscal Year Final Research Report
Next generation multilingual End-to-End speech recognition (from G30 to G200)
Project/Area Number |
19K24376
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Multi-year Fund |
Review Section |
1002:Human informatics, applied informatics and related fields
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
Li Sheng 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター 先進的音声技術研究室, 研究員 (70840940)
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Keywords | speech recognition / multilingual / articulation / End-to-End |
Outline of Final Research Achievements |
As the most natural way of communication, voice interface with the support of automatic speech recognition (ASR) technology has become crucial in human-computer interaction (HCI) in various devices of today's high-digitized society. Most commercial ASR-enabled products focus on specific popular languages such as English, French, Chinese, Japanese. The speech recognition of less popular languages, such as the ASEAN languages, is still a topic worthy of continued research. Global internationalization raises many real-life situations of multilingual communication, such as regional events, cultural exchanges, festivals. The proposed project focused on tackling the problems of the low-resource data and modeling many languages in a single model under the current state-of-the-art End-to-End modeling framework. We also made an in-depth investigation of these problems.
|
Free Research Field |
知覚情報処理
|
Academic Significance and Societal Importance of the Research Achievements |
This research shows we can integrate linguistic knowledge into the neural network instead of adding more layers or enlarging the model size. The proposed method is universally available for broad tasks for Society 5.0 (such as multilingual speech recognition, disordered speech recognition).
|