2020 Fiscal Year Annual Research Report

Next generation multilingual End-to-End speech recognition (from G30 to G200)

Research Project

Project/Area Number	19K24376
Research Institution	National Institute of Information and Communications Technology
Principal Investigator	李勝国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 研究員 (70840940)
Project Period (FY)	2019-08-30 – 2021-03-31
Keywords	multilingual modeling / low-resourced modeling / speech translation / multi-unit modeling / language identification / disordered speech / code-switched
Outline of Annual Research Achievements	In FY2020, I focus on accent speech recognition (English and Chinese), cross-language family speech recognition. Multilingual speech recognition technologies have also been applied to language identification, speaker recognition, disordered speech recognition, and more complex tasks, such as speech translation and adversarial attack.Achievements are as follows: 1. This year's investigation of multilingual modeling technology has been applied to speaker modeling (1 domestic presentation: IEICE-SP)， low-resource transfer learning (1 Interspeech SLIMT2020), and speech translation (NLP2021 presentation), language identification (1 journal paper of IEEE-TASLP), and disordered speech recognition (1 Interspeech2020 with grant honor, 1 O-COCOSDA). 2. I also find the acoustic modeling unit selection technology can enhance single-language speech recognition with multi-unit (1 invited full paper on 1 Interspeech SLIMT2020, 1 ICASSP2021) and code-switched speech synthesis (1 Interspeech SLIMT2020, 1 ICONIP paper). 3. Following researches also benefit with the multilingual modeling technologies: speech separation (1 Interspeech2020 with grant honor), adversarial attack (1 IEEE-SLT demo paper), voice-privacy (1 invited report on Interspeech SLIMT2020, 1 Interspeech challenge, 1 ACM-CCS demo), voice activity detection (1 ICASSP2021), Mandarin tone modeling (1 ICASSP2021)
Remarks	The paper urls can be found in these pages.

Research Products
(23 results)

All 2021 2020 Other

All Int'l Joint Research (1 results) Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (16 results) (of which Int'l Joint Research: 14 results, Invited: 4 results) Remarks (5 results)

[Int'l Joint Research] Tianjin University/Xinjiang University/Hithink RoyalFlush AI(中国)
- Country Name
  CHINA
- Counterpart Institution
  Tianjin University/Xinjiang University/Hithink RoyalFlush AI
[Journal Article] Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification2020
- Author(s)
  P. Shen, X. Lu, S. Li, H. Kawai.
- Journal Title
  
  IEEE/ACM Trans. Audio, Speech \& Language Process.
  
  Volume: 28 Pages: 2674 - 2683
- DOI
  10.1109/TASLP.2020.3023627
- Peer Reviewed
[Presentation] Robust voice activity detection using a masked auditory encoder based convolutional neural network.2021
- Author(s)
  N. Li, L. Wang, M. Unoki, S. Li, R. Wang, M. Ge, J. Dang,
- Organizer
  IEEE-ICASSP, 2021
- Int'l Joint Research
[Presentation] An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.2021
- Author(s)
  S. Chen, X. Hu, S. Li, X. Xu,
- Organizer
  IEEE-ICASSP, 2021.
- Int'l Joint Research
[Presentation] Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.2021
- Author(s)
  H. Huang, K. Wang, Y. Hu, S. Li,
- Organizer
  IEEE-ICASSP, 2021.
- Int'l Joint Research
[Presentation] Comparison of End-to-End Models for Joint Speaker and Speech Recognition2021
- Author(s)
  K. Soky, S. Li, M. Mimura, C. Chu, T. Kawahara,
- Organizer
  IEICE-SP, 2021.
[Presentation] Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.2020
- Author(s)
  H. Zhang, S. Li, X. Ma, Y. Zhao, Y. Cao, T. Kawahara,
- Organizer
  IEEE-SLT, 2021
- Int'l Joint Research
[Presentation] Multilingual transformer training for Khmer automatic speech recognition2020
- Author(s)
  K. Soky, S. Li, T. Kawahara, S. Seng,
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020)
- Int'l Joint Research / Invited
[Presentation] End-to-End Speech Translation with Cross-lingual Transfer Learning2020
- Author(s)
  S. Shimizu, C. Chu, S. Li, S. Kurohashi,
- Organizer
  NLP, 2021.
[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding2020
- Author(s)
  S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020)
- Int'l Joint Research / Invited
[Presentation] A Mixture of Character and Word End-to-End System for Keyword Spotting2020
- Author(s)
  H. Zhang, S. Ueno, M. Mimura, S. Li, W. Zhang, T. Kawahara,
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020)(full paper).
- Int'l Joint Research / Invited
[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data2020
- Author(s)
  S. Guo, L. Wang, S. Li, J. Zhang, C. Gong, Y. Wang, J. Dang, K. Honda.
- Organizer
  In Proc. ICONIP, 2020.
- Int'l Joint Research
[Presentation] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription2020
- Author(s)
  Y. Lin, L. Wang, S. Li, J. Dang, and C. Ding.
- Organizer
  In Proc. INTERSPEECH, 2020 (Travel Granted by ISCA).
- Int'l Joint Research
[Presentation] VOIS: The First Speech Therapy App in the World for Myanmar Hearing-Impaired Children.2020
- Author(s)
  A. Thida, N. Han, S. Oo, S. Li and C. Ding.
- Organizer
  In Proc. O-COCOSDA, 2020.
- Int'l Joint Research
[Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release,2020
- Author(s)
  Y. Han, Y. Cao, S. Li, Q. Ma, M. Yoshikawa.
- Organizer
  Interspeech 2020 Satellite Workshop (SLIMTS2020) (invited report).
- Int'l Joint Research / Invited
[Presentation] Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.2020
- Author(s)
  Y. Han, Y. Cao, S. Li, Q. Ma, M. Yoshikawa.
- Organizer
  ACM conference on Computer and Communications Security (CCS), demo, 2020.
- Int'l Joint Research
[Presentation] System Description for Voice Privacy Challenge (Kyoto Team).2020
- Author(s)
  Y. Han, S. Li, Y. Cao, M. Yoshikawa,
- Organizer
  In special session of INTERSPEECH 2020 (VoicePrivacy challenge 2020).
- Int'l Joint Research
[Presentation] Singing Voice Extraction with Attention based Spectrograms Fusion.2020
- Author(s)
  H. Shi, L. Wang, S. Li, C. Ding, M. Ge, N. Li, J. Dang, and H. Seki.
- Organizer
  In Proc. INTERSPEECH, 2020 (Travel Granted by ISCA).
- Int'l Joint Research
[Remarks] publication information on DBLP
- URL
  https://dblp.dagstuhl.de/pid/23/3439-10.html
[Remarks] Google scholar homepage
- URL
  https://scholar.google.com/citations?hl=en&user=zHAhs0IAAAAJ
[Remarks] researchmap homepage
- URL
  https://researchmap.jp/listen
[Remarks] NICT researcher's homepage
- URL
  https://ast-astrec.nict.go.jp/aboutus/member/sheng-li/index.html
[Remarks] researchgage researcher's homepage
- URL
  https://www.researchgate.net/profile/Sheng-Li-60

2020 Fiscal Year Annual Research Report

Next generation multilingual End-to-End speech recognition (from G30 to G200)

Principal Investigator

李 勝 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター 先進的音声技術研究室, 研究員 (70840940)

Research Products

[Int'l Joint Research] Tianjin University/Xinjiang University/Hithink RoyalFlush AI(中国)

Country Name

Counterpart Institution

[Journal Article] Knowledge Distillation-based Representation Learning for Short-Utterance Spoken Language Identification2020

Author(s)

Journal Title

DOI

[Presentation] Robust voice activity detection using a masked auditory encoder based convolutional neural network.2021

Author(s)

Organizer

[Presentation] An investigation of using hybrid modeling units for improving End-to-End speech recognition systems.2021

Author(s)

Organizer

[Presentation] Encoder-Decoder based pitch tracking and joint model training for Mandarin tone classification.2021

Author(s)

Organizer

[Presentation] Comparison of End-to-End Models for Joint Speaker and Speech Recognition2021

Author(s)

Organizer

[Presentation] Phantom in the Opera: Effective Adversarial Music Attack on Keyword Spotting Systems.2020

Author(s)

Organizer

[Presentation] Multilingual transformer training for Khmer automatic speech recognition2020

Author(s)

Organizer

[Presentation] End-to-End Speech Translation with Cross-lingual Transfer Learning2020

Author(s)

Organizer

[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data and mask embedding2020

Author(s)

Organizer

[Presentation] A Mixture of Character and Word End-to-End System for Keyword Spotting2020

Author(s)

Organizer

[Presentation] Effectively Synthesizing Code-switched Speech Using Highly Imbalanced Mix-lingual Data2020

Author(s)

Organizer

[Presentation] Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription2020

Author(s)

Organizer

[Presentation] VOIS: The First Speech Therapy App in the World for Myanmar Hearing-Impaired Children.2020

Author(s)

Organizer

[Presentation] Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release,2020

Author(s)

Organizer

[Presentation] Voice-Indistinguishability: Protecting Voiceprint with Differential Privacy under an Untrusted Server.2020

Author(s)

Organizer

[Presentation] System Description for Voice Privacy Challenge (Kyoto Team).2020

Author(s)

Organizer

[Presentation] Singing Voice Extraction with Attention based Spectrograms Fusion.2020

Author(s)

Organizer

[Remarks] publication information on DBLP

URL

[Remarks] Google scholar homepage

URL

[Remarks] researchmap homepage

URL

[Remarks] NICT researcher's homepage

URL

[Remarks] researchgage researcher's homepage

URL

李勝国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的音声技術研究室, 研究員 (70840940)