2021 Fiscal Year Research-status Report

Phantom in the Opera: the Vulnerabilities of Speech Interface for Robotic Dialogue System

Research Project

Project/Area Number	21K17837
Research Institution	National Institute of Information and Communications Technology
Principal Investigator	李勝国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)
Project Period (FY)	2021-04-01 – 2023-03-31
Keywords	adversarial attacks / speech recognition / speech enhancement
Outline of Annual Research Achievements	Although COVID19, our project is fruitful and concrete as planned. We followed new powerful deep neural network-based models and new attack methods in the last two years. To protect the system from attacks, we are very interested in using existing technologies, e.g., speech enhancement or adaptation, to solve this problem. This year, my research focuses on investigating the potential of speech enhancement. Papers from Journals and top conferences have been accepted in our research. Next year, we will continue to focus on building concrete speech recognition systems with new popular models and attacking methods. Reliable and easy-implement methods, e.g., speech enhancement, will also be investigated to protect the system from adversarial attacks.
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason This year, the progress is as follows: We construct speech recognition systems with recent popular training toolkits and neural network types (accepted in Journals and conferences, e.g., ICASSP2022) We did surveys for the current attack methods. We implement robust adversarial attacks using the Kaldi-based ASR systems. We are also happy to see that this framework can be used to protect sensitive speech content (accepted in LREC2022). To defend against attacks, we find that adversarial audios are very sensitive. Moreover, the feature of its spectrogram is very different from the human voice, and it can be treated as a special kind of noise. We construct speech enhancement systems and study their mechanism this year (accepted in Journals and conferences, e.g., ICASSP2022).
Strategy for Future Research Activity	Next year, we will continue to build concrete speech recognition systems with new popular models and attacking methods with state-of-the-art frameworks, e.g., transformer. To defend against the attacks, we are very interested in using existing technologies, e.g., speech enhancement or adaptation, to solve this problem. Papers from journals and conferences will be expected.
Causes of Carryover	Last year, because of COVID19, all international conferences and academic visiting were canceled. I did not spend the funding, and I mainly did online research activity. This year, regarding business regularization, I will continue to limit business traveling. So, the funding will be spent on purchasing devices (e.g., spoken dialogue robot, database, musical instrument) and paper publication fees (e.g., books, conferences, and journal papers).

Research Products
(24 results)

All 2022 2021 Other

All Int'l Joint Research (2 results) Journal Article (2 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 2 results, Open Access: 2 results) Presentation (16 results) (of which Int'l Joint Research: 16 results) Remarks (4 results)

[Int'l Joint Research] Tianjin University/Xinjiang University/Royal Flush AI Research Inc.(中国)
- Country Name
  CHINA
- Counterpart Institution
  Tianjin University/Xinjiang University/Royal Flush AI Research Inc.
[Int'l Joint Research] Nanyang Technological University(シンガポール)
- Country Name
  SINGAPORE
- Counterpart Institution
  Nanyang Technological University
[Journal Article] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling2022
- Author(s)
  Qin Siqing、Wang Longbiao、Li Sheng、Dang Jianwu、Pan Lixin
- Journal Title
  
  EURASIP Journal on Audio, Speech, and Music Processing
  
  Volume: - Pages: 1-10
- DOI
  10.1186/s13636-021-00233-4
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview2021
- Author(s)
  Chen Xiaojiao、Li Sheng、Huang Hao
- Journal Title
  
  Applied Sciences
  
  Volume: 11 Pages: 8450-8450
- DOI
  10.3390/app11188450
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model2022
- Author(s)
  Z. Gong, D. Saito, L. Yang, T. Shinozaki, S. Li, H. Kawai and N. Minematsu
- Organizer
  ISCA-Odyssey (The Speaker and Language Recognition Workshop)
- Int'l Joint Research
[Presentation] Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection2022
- Author(s)
  S. Li, J. Li, Q. Liu and Z. Gong
- Organizer
  LREC (Language Resources and Evaluation Conference)
- Int'l Joint Research
[Presentation] Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation2022
- Author(s)
  Y. Lv, L. Wang, M. Ge, S. Li, C. Ding, L. Pan, Y. Wang, J. Dang, K. Honda
- Organizer
  in Proc. IEEE-ICASSP, pp. 7992--7996, 2022.
- Int'l Joint Research
[Presentation] Mining Hard Samples Locally and Globally for Improved Speech Separation2022
- Author(s)
  K. Wang, Y. Peng, H. Huang, Y. Hu, and S. Li
- Organizer
  in Proc. IEEE-ICASSP, pp. 6037--6041, 2022.
- Int'l Joint Research
[Presentation] The System Description for VoiceMOS Challenge 2022 (KK team, main/ood tasks)2022
- Author(s)
  S. Li, R. Dabre, R. Raphael, W. Zhou, Z. Yang, C. Chu, Y. Zhao
- Organizer
  VoiceMOS Challenge 2022
- Int'l Joint Research
[Presentation] Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition2021
- Author(s)
  H. Shi, L. Wang, S. Li, C. Fan, J. Dang, and T. Kawahara
- Organizer
  In Proc. APSIPA ASC, pp. 438--442, 2021.
- Int'l Joint Research
[Presentation] Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework2021
- Author(s)
  Y. Peng, J. Zhang, H. Zhang, H. Xu, H. Huang, S. Li, and E.S. Chng
- Organizer
  In Proc. APSIPA ASC, pp. 1043--1048, 2021.
- Int'l Joint Research
[Presentation] On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora2021
- Author(s)
  K. Soky, S. Li, M. Mimura, C. Chu, and T. Kawahara
- Organizer
  In Proc. APSIPA ASC, pp. 433--437, 2021.
- Int'l Joint Research
[Presentation] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model2021
- Author(s)
  D. Wang, S. Ye, X. Hu, S. Li, and X. Xu
- Organizer
  in Proc. INTERSPEECH, pp. 3266--3270, 2021.
- Int'l Joint Research
[Presentation] End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time- Frequency Domain2021
- Author(s)
  K. Wang, H. Huang, Y. Hu, Z. Huang, and S. Li
- Organizer
  in Proc. INTERSPEECH, pp. 3046--3050, 2021.
- Int'l Joint Research
[Presentation] The RoyalFlush-NICT System Description for AP21-OLR Challenge (Silk-road team, full tasks)2021
- Author(s)
  D. Wang, S. Ye, X. Hu, S. Li
- Organizer
  OLR2021 (oriental language recognition challenge)
- Int'l Joint Research
[Presentation] System description of Alzheimer's disease early detection (Silk-road team, short speech track)2021
- Author(s)
  W. Wei, R. Wong, S. Li, Y. Guo and H. Huang
- Organizer
  In special session of NCMMSC2021 (Alzheimer's disease detection challenge), 2021
- Int'l Joint Research
[Presentation] Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview2021
- Author(s)
  X. Chen, H. Huang, and S. Li
- Organizer
  National Conference on Man-Machine Speech Communication (NCMMSC), 2021. (report is selected to publish in Applied Sciences, Special Issues of Machine Speech Communication)
- Int'l Joint Research
[Presentation] Speech Dereverberation Based on Scale-aware Mean Square Error Loss2021
- Author(s)
  L. Qiang, H. Shi, M. Ge, H. Yin, N. Li, L. Wang, S. Li and J. Dang
- Organizer
  International Conference on Neural Information Processing (ICONIP2021), pp 55-63, Springer, 2021.
- Int'l Joint Research
[Presentation] Simultaneous Progressive Filtering-based Monaural Speech Enhancement2021
- Author(s)
  H. Yin, L. Qiang, H. Shi, L. Wang, S. Li, M. Ge, G. Zhang and J. Dang
- Organizer
  International Conference on Neural Information Processing (ICONIP2021), pp 213-221, Springer, 2021.
- Int'l Joint Research
[Presentation] Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS2021
- Author(s)
  D. Liu, L. Wang, S. Li, H. Li, C. Ding, J. Zhang and J. Dang
- Organizer
  International Conference on Neural Information Processing (ICONIP2021), pp 110-118, Springer, 2021.
- Int'l Joint Research
[Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。
- URL
  https://www.nict.go.jp/outcome/journals/journals_2021_j.html
[Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。
- URL
  https://www.nict.go.jp/outcome/proceedings/proceedings_2021_j.html
[Remarks] google scholar of Sheng Li
- URL
  https://scholar.google.com/citations?user=zHAhs0IAAAAJ&hl=en
[Remarks] Lab homepage of Sheng Li
- URL
  https://ast-astrec.nict.go.jp/member/sheng-li/index.html

2021 Fiscal Year Research-status Report

Phantom in the Opera: the Vulnerabilities of Speech Interface for Robotic Dialogue System

Principal Investigator

李 勝 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] Tianjin University/Xinjiang University/Royal Flush AI Research Inc.(中国)

Country Name

Counterpart Institution

[Int'l Joint Research] Nanyang Technological University(シンガポール)

Country Name

Counterpart Institution

[Journal Article] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling2022

Author(s)

Journal Title

DOI

[Journal Article] Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview2021

Author(s)

Journal Title

DOI

[Presentation] Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model2022

Author(s)

Organizer

[Presentation] Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection2022

Author(s)

Organizer

[Presentation] Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation2022

Author(s)

Organizer

[Presentation] Mining Hard Samples Locally and Globally for Improved Speech Separation2022

Author(s)

Organizer

[Presentation] The System Description for VoiceMOS Challenge 2022 (KK team, main/ood tasks)2022

Author(s)

Organizer

[Presentation] Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition2021

Author(s)

Organizer

[Presentation] Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework2021

Author(s)

Organizer

[Presentation] On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora2021

Author(s)

Organizer

[Presentation] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model2021

Author(s)

Organizer

[Presentation] End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time- Frequency Domain2021

Author(s)

Organizer

[Presentation] The RoyalFlush-NICT System Description for AP21-OLR Challenge (Silk-road team, full tasks)2021

Author(s)

Organizer

[Presentation] System description of Alzheimer's disease early detection (Silk-road team, short speech track)2021

Author(s)

Organizer

[Presentation] Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview2021

Author(s)

Organizer

[Presentation] Speech Dereverberation Based on Scale-aware Mean Square Error Loss2021

Author(s)

Organizer

[Presentation] Simultaneous Progressive Filtering-based Monaural Speech Enhancement2021

Author(s)

Organizer

[Presentation] Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS2021

Author(s)

Organizer

[Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。

URL

[Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。

URL

[Remarks] google scholar of Sheng Li

URL

[Remarks] Lab homepage of Sheng Li

URL

李勝国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)