• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Next generation multilingual End-to-End speech recognition (from G30 to G200)

Research Project

Project/Area Number 19K24376
Research Category

Grant-in-Aid for Research Activity Start-up

Allocation TypeMulti-year Fund
Review Section 1002:Human informatics, applied informatics and related fields
Research InstitutionNational Institute of Information and Communications Technology

Principal Investigator

Li Sheng  国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター 先進的音声技術研究室, 研究員 (70840940)

Project Period (FY) 2019-08-30 – 2021-03-31
Project Status Completed (Fiscal Year 2020)
Budget Amount *help
¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywordsspeech recognition / multilingual / articulation / End-to-End / multilingual modeling / low-resourced modeling / speech translation / multi-unit modeling / language identification / disordered speech / code-switched / end-to-end / speaker diarization
Outline of Research at the Start

This project will focus on tackling the problems of the low-resource language (e.g., ASEAN languages) and modeling languages as many as we can (hundreds of languages from all language families) in a single model under current state-of-the-art End-to-End automatic speech recognition (ASR) framework.

Outline of Final Research Achievements

As the most natural way of communication, voice interface with the support of automatic speech recognition (ASR) technology has become crucial in human-computer interaction (HCI) in various devices of today's high-digitized society. Most commercial ASR-enabled products focus on specific popular languages such as English, French, Chinese, Japanese. The speech recognition of less popular languages, such as the ASEAN languages, is still a topic worthy of continued research. Global internationalization raises many real-life situations of multilingual communication, such as regional events, cultural exchanges, festivals.
The proposed project focused on tackling the problems of the low-resource data and modeling many languages in a single model under the current state-of-the-art End-to-End modeling framework. We also made an in-depth investigation of these problems.

Academic Significance and Societal Importance of the Research Achievements

This research shows we can integrate linguistic knowledge into the neural network instead of adding more layers or enlarging the model size. The proposed method is universally available for broad tasks for Society 5.0 (such as multilingual speech recognition, disordered speech recognition).

Report

(3 results)
  • 2020 Annual Research Report   Final Research Report ( PDF )
  • 2019 Research-status Report
  • Research Products

    (40 results)

All 2021 2020 2019 Other

All Int'l Joint Research (2 results) Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (24 results) (of which Int'l Joint Research: 18 results,  Invited: 4 results) Book (1 results) Remarks (5 results) Patent(Industrial Property Rights) (4 results) Funded Workshop (3 results)

  • [Int'l Joint Research] Tianjin University/Xinjiang University/Hithink RoyalFlush AI(中国)

    • Related Report
      2020 Annual Research Report
  • [Int'l Joint Research] Tianjin University(中国)

    • Related Report
      2019 Research-status Report
  • [Journal Article] Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification2020

    • Author(s)
      P. Shen, X. Lu, S. Li, H. Kawai.
    • Journal Title

      IEEE/ACM Trans. Audio, Speech \& Language Process.

      Volume: 28 Pages: 2674-2683

    • DOI

      10.1109/taslp.2020.3023627

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed
  • [Presentation] Robust voice activity detection using a masked auditory encoder based convolutional neural network.2021

    • Author(s)
      N. Li, L. Wang, M. Unoki, S. Li, R. Wang, M. Ge, J. Dang,
    • Organizer
      IEEE-ICASSP, 2021
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.2021

    • Author(s)
      S. Chen, X. Hu, S. Li, X. Xu,
    • Organizer
      IEEE-ICASSP, 2021.
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.2021

    • Author(s)
      H. Huang, K. Wang, Y. Hu, S. Li,
    • Organizer
      IEEE-ICASSP, 2021.
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Comparison of End-to-End Models for Joint Speaker and Speech Recognition2021

    • Author(s)
      K. Soky, S. Li, M. Mimura, C. Chu, T. Kawahara,
    • Organizer
      IEICE-SP, 2021.
    • Related Report
      2020 Annual Research Report
  • [Presentation] Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.2020

    • Author(s)
      H. Zhang, S. Li, X. Ma, Y. Zhao, Y. Cao, T. Kawahara,
    • Organizer
      IEEE-SLT, 2021
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Multilingual transformer training for Khmer automatic speech recognition2020

    • Author(s)
      K. Soky, S. Li, T. Kawahara, S. Seng,
    • Organizer
      Interspeech 2020 Satellite Workshop (SLIMTS2020)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] End-to-End Speech Translation with Cross-lingual Transfer Learning2020

    • Author(s)
      S. Shimizu, C. Chu, S. Li, S. Kurohashi,
    • Organizer
      NLP, 2021.
    • Related Report
      2020 Annual Research Report
  • [Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding2020

    • Author(s)
      S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda
    • Organizer
      Interspeech 2020 Satellite Workshop (SLIMTS2020)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] A Mixture of Character and Word End-to-End System for Keyword Spotting2020

    • Author(s)
      H. Zhang, S. Ueno, M. Mimura, S. Li, W. Zhang, T. Kawahara,
    • Organizer
      Interspeech 2020 Satellite Workshop (SLIMTS2020)(full paper).
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data2020

    • Author(s)
      S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda.
    • Organizer
      In Proc. ICONIP, 2020.
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription2020

    • Author(s)
      Y. Lin, L. Wang, S. Li, J. Dang, and C. Ding.
    • Organizer
      In Proc. INTERSPEECH, 2020 (Travel Granted by ISCA).
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] VOIS: The First Speech Therapy App in the World for Myanmar Hearing-Impaired Children.2020

    • Author(s)
      A. Thida, N. Han, S. Oo, S. Li and C. Ding.
    • Organizer
      In Proc. O-COCOSDA, 2020.
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release,2020

    • Author(s)
      Y. Han, Y. Cao, S. Li, Q. Ma, M. Yoshikawa.
    • Organizer
      Interspeech 2020 Satellite Workshop (SLIMTS2020) (invited report).
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.2020

    • Author(s)
      Y. Han, Y. Cao, S. Li, Q. Ma, M. Yoshikawa.
    • Organizer
      ACM conference on Computer and Communications Security (CCS), demo, 2020.
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] System Description for Voice Privacy Challenge (Kyoto Team).2020

    • Author(s)
      Y. Han, S. Li, Y. Cao, M. Yoshikawa,
    • Organizer
      In special session of INTERSPEECH 2020 (VoicePrivacy challenge 2020).
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Singing Voice Extraction with Attention based Spectrograms Fusion.2020

    • Author(s)
      H. Shi, L. Wang, S. Li, C. Ding, M. Ge, N. Li, J. Dang, and H. Seki.
    • Organizer
      In Proc. INTERSPEECH, 2020 (Travel Granted by ISCA).
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Joint Training End-to-End Speech Recognition Systems with Speaker Attributes.2020

    • Author(s)
      S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai
    • Organizer
      ISCA-Odyssey (The Speaker and Language Recognition Workshop)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Compensation on x-vector for short utterance spoken language identification.2020

    • Author(s)
      P. Shen, X. Lu, K. Sugiura, S. Li and H. Kawai.
    • Organizer
      ISCA-Odyssey (The Speaker and Language Recognition Workshop)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy Preserving Speech Data Release.2020

    • Author(s)
      Y. Han, S. Li, Y. Cao, Q. Ma and M. Yoshikawa.
    • Organizer
      IEEE-ICME
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] End-To-End Articulatory Modeling for Dysarthria Articulatory Attribute Detection.2020

    • Author(s)
      Y. Lin, L. Wang, J. Dang, S. Li, and C. Ding.
    • Organizer
      IEEE-ICASSP
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Spectrograms Fusion with Minimum Difference Masks Estimation for Monaural Speech Dereverberation.2020

    • Author(s)
      H. Shi, L. Wang, M. Ge, S. Li, and J. Dang.
    • Organizer
      IEEE-ICASSP
    • Related Report
      2019 Research-status Report
  • [Presentation] End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition,2020

    • Author(s)
      S. Li, C. Ding, X. Lu, P. Shen and H. Kawai,
    • Organizer
      Acoustical Society of Japan, spring, 2020.
    • Related Report
      2019 Research-status Report
  • [Presentation] Joint Training End-to-End Systems for Speech and Speaker Recognition with Speaker Attributes,2020

    • Author(s)
      S. Li, X. Lu, R. Dabre, P. Shen and H. Kawai,
    • Organizer
      Acoustical Society of Japan, spring, 2020.
    • Related Report
      2019 Research-status Report
  • [Presentation] Improvement of x-vector for short utterance spoken language identification,2020

    • Author(s)
      P. Shen, X. Lu, K. Sugiura, S. Li, H. Kawai,
    • Organizer
      Acoustical Society of Japan, spring, 2020.
    • Related Report
      2019 Research-status Report
  • [Book] Automatic speech recognition2020

    • Author(s)
      X. Lu, S. Li, M. Fujimoto
    • Total Pages
      18
    • Publisher
      Springer Singapore
    • ISBN
      9789811505959
    • Related Report
      2019 Research-status Report
  • [Remarks] publication information on DBLP

    • URL

      https://dblp.dagstuhl.de/pid/23/3439-10.html

    • Related Report
      2020 Annual Research Report
  • [Remarks] Google scholar homepage

    • URL

      https://scholar.google.com/citations?hl=en&user=zHAhs0IAAAAJ

    • Related Report
      2020 Annual Research Report
  • [Remarks] researchmap homepage

    • URL

      https://researchmap.jp/listen

    • Related Report
      2020 Annual Research Report
  • [Remarks] NICT researcher's homepage

    • URL

      https://ast-astrec.nict.go.jp/aboutus/member/sheng-li/index.html

    • Related Report
      2020 Annual Research Report
  • [Remarks] researchgage researcher's homepage

    • URL

      https://www.researchgate.net/profile/Sheng-Li-60

    • Related Report
      2020 Annual Research Report
  • [Patent(Industrial Property Rights)] 推論器および推論器の学習方法2020

    • Inventor(s)
      李勝、ルーシュガン、河井恒
    • Industrial Property Rights Holder
      国立研究開発法人情報通信研究機構
    • Industrial Property Rights Type
      特許
    • Industrial Property Number
      2020-059962
    • Filing Date
      2020
    • Related Report
      2019 Research-status Report
  • [Patent(Industrial Property Rights)] 推論器、推論プログラムおよび学習方法2019

    • Inventor(s)
      李勝、 ルーシュガン、 丁塵辰、 河原達也、 河井恒
    • Industrial Property Rights Holder
      国立研究開発法人情報通信研究機構
    • Industrial Property Rights Type
      特許
    • Industrial Property Number
      2019-163555
    • Filing Date
      2019
    • Related Report
      2019 Research-status Report
  • [Patent(Industrial Property Rights)] 推論器、学習方法および学習プログラム2019

    • Inventor(s)
      李勝、 ルーシュガン、 ダブレラジ、 河井恒
    • Industrial Property Rights Holder
      国立研究開発法人情報通信研究機構
    • Industrial Property Rights Type
      特許
    • Industrial Property Number
      2019-051008
    • Filing Date
      2019
    • Related Report
      2019 Research-status Report
  • [Patent(Industrial Property Rights)] 言語識別モデルの訓練方法及び装置、並びにそのためのコンピュータプログラム2019

    • Inventor(s)
      沈 鵬, ルー シュガン , 李 勝 , 河井 恒
    • Industrial Property Rights Holder
      国立研究開発法人情報通信研究機構
    • Industrial Property Rights Type
      特許
    • Industrial Property Number
      2019-086005
    • Filing Date
      2019
    • Acquisition Date
      2020
    • Related Report
      2019 Research-status Report
  • [Funded Workshop] Odyssey2020 The Speaker and Language Recognition Workshop2020

    • Related Report
      2019 Research-status Report
  • [Funded Workshop] ICASSP20202020

    • Related Report
      2019 Research-status Report
  • [Funded Workshop] ICME20202020

    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-09-03   Modified: 2022-01-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi