Next generation multilingual End-to-End speech recognition (from G30 to G200)

Research Project

Project/Area Number	19K24376
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Multi-year Fund
Review Section	1002:Human informatics, applied informatics and related fields
Research Institution	National Institute of Information and Communications Technology
Principal Investigator	Li Sheng 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 研究員 (70840940)
Project Period (FY)	2019-08-30 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000) Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	speech recognition / multilingual / articulation / End-to-End / multilingual modeling / low-resourced modeling / speech translation / multi-unit modeling / language identification / disordered speech / code-switched / end-to-end / speaker diarization
Outline of Research at the Start	This project will focus on tackling the problems of the low-resource language (e.g., ASEAN languages) and modeling languages as many as we can (hundreds of languages from all language families) in a single model under current state-of-the-art End-to-End automatic speech recognition (ASR) framework.
Outline of Final Research Achievements	As the most natural way of communication, voice interface with the support of automatic speech recognition (ASR) technology has become crucial in human-computer interaction (HCI) in various devices of today's high-digitized society. Most commercial ASR-enabled products focus on specific popular languages such as English, French, Chinese, Japanese. The speech recognition of less popular languages, such as the ASEAN languages, is still a topic worthy of continued research. Global internationalization raises many real-life situations of multilingual communication, such as regional events, cultural exchanges, festivals. The proposed project focused on tackling the problems of the low-resource data and modeling many languages in a single model under the current state-of-the-art End-to-End modeling framework. We also made an in-depth investigation of these problems.
Academic Significance and Societal Importance of the Research Achievements	This research shows we can integrate linguistic knowledge into the neural network instead of adding more layers or enlarging the model size. The proposed method is universally available for broad tasks for Society 5.0 (such as multilingual speech recognition, disordered speech recognition).

Report

(3 results)

2020 Annual Research Report Final Research Report ( PDF )
2019 Research-status Report

Research Products
(40 results)

All 2021 2020 2019 Other

All Int'l Joint Research (2 results) Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (24 results) (of which Int'l Joint Research: 18 results, Invited: 4 results) Book (1 results) Remarks (5 results) Patent(Industrial Property Rights) (4 results) Funded Workshop (3 results)

[Int'l Joint Research] Tianjin University/Xinjiang University/Hithink RoyalFlush AI(中国)
- Related Report
  2020 Annual Research Report
[Int'l Joint Research] Tianjin University(中国)
- Related Report
  2019 Research-status Report
[Journal Article] Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification2020
- Author(s)
  P. Shen, X. Lu, S. Li, H. Kawai.
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech \& Language Process.
  
  Volume: 28 Pages: 2674-2683
- DOI
  10.1109/taslp.2020.3023627
- Related Report
  2020 Annual Research Report
- Peer Reviewed
[Presentation] Robust voice activity detection using a masked auditory encoder based convolutional neural network.2021
- Author(s)
  N. Li, L. Wang, M. Unoki, S. Li, R. Wang, M. Ge, J. Dang,
- Organizer
  IEEE-ICASSP, 2021
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.2021
- Author(s)
  S. Chen, X. Hu, S. Li, X. Xu,
- Organizer
  IEEE-ICASSP, 2021.
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.2021
- Author(s)
  H. Huang, K. Wang, Y. Hu, S. Li,
- Organizer
  IEEE-ICASSP, 2021.
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Comparison of End-to-End Models for Joint Speaker and Speech Recognition2021
- Author(s)
  K. Soky, S. Li, M. Mimura, C. Chu, T. Kawahara,
- Organizer
  IEICE-SP, 2021.
- Related Report
  2020 Annual Research Report
[Presentation] Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.2020
- Author(s)
  H. Zhang, S. Li, X. Ma, Y. Zhao, Y. Cao, T. Kawahara,
- Organizer
  IEEE-SLT, 2021
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Multilingual transformer training for Khmer automatic speech recognition2020
- Author(s)
  K. Soky, S. Li, T. Kawahara, S. Seng,
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] End-to-End Speech Translation with Cross-lingual Transfer Learning2020
- Author(s)
  S. Shimizu, C. Chu, S. Li, S. Kurohashi,
- Organizer
  NLP, 2021.
- Related Report
  2020 Annual Research Report
[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding2020
- Author(s)
  S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] A Mixture of Character and Word End-to-End System for Keyword Spotting2020
- Author(s)
  H. Zhang, S. Ueno, M. Mimura, S. Li, W. Zhang, T. Kawahara,
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020)(full paper).
- Related Report
  2020 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data2020
- Author(s)
  S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda.
- Organizer
  In Proc. ICONIP, 2020.
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription2020
- Author(s)
  Y. Lin, L. Wang, S. Li, J. Dang, and C. Ding.
- Organizer
  In Proc. INTERSPEECH, 2020 (Travel Granted by ISCA).
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] VOIS: The First Speech Therapy App in the World for Myanmar Hearing-Impaired Children.2020
- Author(s)
  A. Thida, N. Han, S. Oo, S. Li and C. Ding.
- Organizer
  In Proc. O-COCOSDA, 2020.
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release,2020
- Author(s)
  Y. Han, Y. Cao, S. Li, Q. Ma, M. Yoshikawa.
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020) (invited report).
- Related Report
  2020 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.2020
- Author(s)
  Y. Han, Y. Cao, S. Li, Q. Ma, M. Yoshikawa.
- Organizer
  ACM conference on Computer and Communications Security (CCS), demo, 2020.
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] System Description for Voice Privacy Challenge (Kyoto Team).2020
- Author(s)
  Y. Han, S. Li, Y. Cao, M. Yoshikawa,
- Organizer
  In special session of INTERSPEECH 2020 (VoicePrivacy challenge 2020).
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Singing Voice Extraction with Attention based Spectrograms Fusion.2020
- Author(s)
  H. Shi, L. Wang, S. Li, C. Ding, M. Ge, N. Li, J. Dang, and H. Seki.
- Organizer
  In Proc. INTERSPEECH, 2020 (Travel Granted by ISCA).
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.2020
- Author(s)
  S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai
- Organizer
  ISCA-Odyssey (The Speaker and Language Recognition Workshop)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Compensation on x-vector for short utterance spoken language identification.2020
- Author(s)
  P. Shen, X. Lu, K. Sugiura, S. Li and H. Kawai.
- Organizer
  ISCA-Odyssey (The Speaker and Language Recognition Workshop)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy Preserving Speech Data Release.2020
- Author(s)
  Y. Han, S. Li, Y. Cao, Q. Ma and M. Yoshikawa.
- Organizer
  IEEE-ICME
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] End-To-End Articulatory Modeling for Dysarthria Articulatory Attribute Detection.2020
- Author(s)
  Y. Lin, L. Wang, J. Dang, S. Li, and C. Ding.
- Organizer
  IEEE-ICASSP
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.2020
- Author(s)
  H. Shi, L. Wang, M. Ge, S. Li, and J. Dang.
- Organizer
  IEEE-ICASSP
- Related Report
  2019 Research-status Report
[Presentation] End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition,2020
- Author(s)
  S. Li, C. Ding, X. Lu, P. Shen and H. Kawai,
- Organizer
  Acoustical Society of Japan, spring, 2020.
- Related Report
  2019 Research-status Report
[Presentation] Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes,2020
- Author(s)
  S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai,
- Organizer
  Acoustical Society of Japan, spring, 2020.
- Related Report
  2019 Research-status Report
[Presentation] Improvement of x-vector for short utterance spoken language identification,2020
- Author(s)
  P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai,
- Organizer
  Acoustical Society of Japan, spring, 2020.
- Related Report
  2019 Research-status Report
[Book] Automatic speech recognition2020
- Author(s)
  X. Lu, S. Li, M. Fujimoto
- Total Pages
  18
- Publisher
  Springer Singapore
- ISBN
  9789811505959
- Related Report
  2019 Research-status Report
[Remarks] publication information on DBLP
- URL
  https://dblp.dagstuhl.de/pid/23/3439-10.html
- Related Report
  2020 Annual Research Report
[Remarks] Google scholar homepage
- URL
  https://scholar.google.com/citations?hl=en&user=zHAhs0IAAAAJ
- Related Report
  2020 Annual Research Report
[Remarks] researchmap homepage
- URL
  https://researchmap.jp/listen
- Related Report
  2020 Annual Research Report
[Remarks] NICT researcher's homepage
- URL
  https://ast-astrec.nict.go.jp/aboutus/member/sheng-li/index.html
- Related Report
  2020 Annual Research Report
[Remarks] researchgage researcher's homepage
- URL
  https://www.researchgate.net/profile/Sheng-Li-60
- Related Report
  2020 Annual Research Report
[Patent(Industrial Property Rights)] 推論器および推論器の学習方法2020
- Inventor(s)
  李勝、ルーシュガン、河井恒
- Industrial Property Rights Holder
  国立研究開発法人情報通信研究機構
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2020-059962
- Filing Date
  2020
- Related Report
  2019 Research-status Report
[Patent(Industrial Property Rights)] 推論器、推論プログラムおよび学習方法2019
- Inventor(s)
  李勝、ルーシュガン、丁塵辰、河原達也、河井恒
- Industrial Property Rights Holder
  国立研究開発法人情報通信研究機構
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2019-163555
- Filing Date
  2019
- Related Report
  2019 Research-status Report
[Patent(Industrial Property Rights)] 推論器、学習方法および学習プログラム2019
- Inventor(s)
  李勝、ルーシュガン、ダブレラジ、河井恒
- Industrial Property Rights Holder
  国立研究開発法人情報通信研究機構
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2019-051008
- Filing Date
  2019
- Related Report
  2019 Research-status Report
[Patent(Industrial Property Rights)] 言語識別モデルの訓練方法及び装置、並びにそのためのコンピュータプログラム2019
- Inventor(s)
  沈鵬, ルーシュガン , 李勝 , 河井恒
- Industrial Property Rights Holder
  国立研究開発法人情報通信研究機構
- Industrial Property Rights Type
  特許
- Industrial Property Number
  2019-086005
- Filing Date
  2019
- Acquisition Date
  2020
- Related Report
  2019 Research-status Report
[Funded Workshop] Odyssey2020 The Speaker and Language Recognition Workshop2020
- Related Report
  2019 Research-status Report
[Funded Workshop] ICASSP20202020
- Related Report
  2019 Research-status Report
[Funded Workshop] ICME20202020
- Related Report
  2019 Research-status Report

Next generation multilingual End-to-End speech recognition (from G30 to G200)

Principal Investigator

Li Sheng 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター 先進的音声技術研究室, 研究員 (70840940)

¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)

Report

Research Products

[Int'l Joint Research] Tianjin University/Xinjiang University/Hithink RoyalFlush AI(中国)

Related Report

[Int'l Joint Research] Tianjin University(中国)

Related Report

[Journal Article] Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] Robust voice activity detection using a masked auditory encoder based convolutional neural network.2021

Author(s)

Organizer

Related Report

[Presentation] An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.2021

Author(s)

Organizer

Related Report

[Presentation] Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.2021

Author(s)

Organizer

Related Report

[Presentation] Comparison of End-to-End Models for Joint Speaker and Speech Recognition2021

Author(s)

Organizer

Related Report

[Presentation] Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.2020

Author(s)

Organizer

Related Report

[Presentation] Multilingual transformer training for Khmer automatic speech recognition2020

Author(s)

Organizer

Related Report

[Presentation] End-to-End Speech Translation with Cross-lingual Transfer Learning2020

Author(s)

Organizer

Related Report

[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding2020

Author(s)

Organizer

Related Report

[Presentation] A Mixture of Character and Word End-to-End System for Keyword Spotting2020

Author(s)

Organizer

Related Report

[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data2020

Author(s)

Organizer

Related Report

[Presentation] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription2020

Author(s)

Organizer

Related Report

[Presentation] VOIS: The First Speech Therapy App in the World for Myanmar Hearing-Impaired Children.2020

Author(s)

Organizer

Related Report

[Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release,2020

Author(s)

Organizer

Related Report

[Presentation] Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.2020

Author(s)

Organizer

Related Report

[Presentation] System Description for Voice Privacy Challenge (Kyoto Team).2020

Author(s)

Organizer

Related Report

[Presentation] Singing Voice Extraction with Attention based Spectrograms Fusion.2020

Author(s)

Organizer

Related Report

[Presentation] Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.2020

Li Sheng 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 研究員 (70840940)