• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Phantom in the Opera: the Vulnerabilities of Speech Interface for Robotic Dialogue System

Research Project

Project/Area Number 21K17837
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61050:Intelligent robotics-related
Research InstitutionNational Institute of Information and Communications Technology

Principal Investigator

Li Sheng  国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)

Project Period (FY) 2021-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Fiscal Year 2022: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Keywordsspeech recognition / adversarial attack / privacy perserving / deepfake detection / spoken dialogue / federated learning / security / privacy preserving / quality estimation / spoken dialogue system / adversarial attacks / speech enhancement / Speech recognition / Dialogue robotic system / Adversarial attack / Deep neural network
Outline of Research at the Start

As the most natural human-machine interface, the automatic speech recognition (ASR) module plays a crucial role in these recent robot dialogue systems. However, a deep neural network (DNN) is known to be vulnerable to adversarial examples (or attacks). This is a severe problem. This study will make an in-depth study to the robustness of the ASR modules of a robot dialogue system.

Outline of Final Research Achievements

In this project, we carefully studied the principles of speech recognition systems and researched all possible attack details. We summarized our findings in a review and proposed methods for improving the front-end and back-end of speech recognition systems.
We expanded our research scope with a universal point of view. Similar attacks can co-exist in speech-related systems, not just speech recognition systems. We also consider adversarial attacks as particular noise, then combining traditional speech enhancement, modeling, and post-processing methods in system development can sufficiently deal with this attack.
Top journals and conferences in the speech field accepted our achievements, such as Interspeech and ICASSP. Above two years of research achievement have been introduced into two books (ISBN: 978-4-904020-26-5, ISBN: 978-4-904020-28-9) by NICT and stored in the national library Kansai. These efforts are our contribution to ensuring the security and reliability of AI systems.

Academic Significance and Societal Importance of the Research Achievements

The development of deep neural networks has been progressing rapidly and the evolution of speech recognition systems has been incredibly fast. The study aims to provide researchers with ideas on improving system security in light of the increasingly severe security issues.

Report

(3 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Research-status Report
  • Research Products

    (40 results)

All 2023 2022 2021 Other

All Int'l Joint Research (2 results) Journal Article (4 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 4 results,  Open Access: 4 results) Presentation (28 results) (of which Int'l Joint Research: 28 results) Book (2 results) Remarks (4 results)

  • [Int'l Joint Research] Tianjin University/Xinjiang University/Royal Flush AI Research Inc.(中国)

    • Related Report
      2021 Research-status Report
  • [Int'l Joint Research] Nanyang Technological University(シンガポール)

    • Related Report
      2021 Research-status Report
  • [Journal Article] Cross-Lingual Transfer Learning for End-to-End Speech Translation2022

    • Author(s)
      Shimizu Shuichiro、Chu Chenhui、Li Sheng、Kurohashi Sadao
    • Journal Title

      Journal of Natural Language Processing

      Volume: 29 Issue: 2 Pages: 611-637

    • DOI

      10.5715/jnlp.29.611

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] TriECCC: Trilingual Corpus of the Extraordinary Chambers in the Courts of Cambodia for Speech Recognition and Translation Studies2022

    • Author(s)
      Soky Kak、Mimura Masato、Kawahara Tatsuya、Chu Chenhui、Li Sheng、Ding Chenchen、Sam Sethserey
    • Journal Title

      International Journal of Asian Language Processing

      Volume: 31 Issue: 03n04 Pages: 1-21

    • DOI

      10.1142/s2717554522500072

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling2022

    • Author(s)
      Qin Siqing、Wang Longbiao、Li Sheng、Dang Jianwu、Pan Lixin
    • Journal Title

      EURASIP Journal on Audio, Speech, and Music Processing

      Volume: 2022 Issue: 1 Pages: 1-10

    • DOI

      10.1186/s13636-021-00233-4

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview2021

    • Author(s)
      Chen Xiaojiao、Li Sheng、Huang Hao
    • Journal Title

      Applied Sciences

      Volume: 11 Issue: 18 Pages: 8450-8450

    • DOI

      10.3390/app11188450

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] GENERAL OR SPECIFIC? INVESTIGATING EFFECTIVE PRIVACY PROTECTION IN FEDERATED LEARNING FOR SPEECH EMOTION RECOGNITION2023

    • Author(s)
      Chao Tan, Yang Cao, Sheng Li and Masatoshi Yoshikawa
    • Organizer
      ICASSP
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] DOMAIN AND LANGUAGE ADAPTATION USING HETEROGENEOUS DATASETS FOR WAV2VEC2.0-BASED SPEECH RECOGNITION OF LOW-RESOURCE LANGUAGE2023

    • Author(s)
      Kak Soky, Sheng Li, Chenhui Chu, Tatsuya Kawahara
    • Organizer
      ICASSP
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Relationship Between Speakers' Physiological Structure and Acoustic Speech Signals: Data-Driven Study Based on Frequency-Wise Attentional Neural Network2022

    • Author(s)
      Kai Li, Xugang Lu, Masato Akagi, Jianwu Dang, Sheng Li, Masashi Unoki
    • Organizer
      30th European Signal Processing Conference (EUSIPCO)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Leveraging Simultaneous Translation for Enhancing Transcription of Low-resource Language via Cross Attention Mechanism2022

    • Author(s)
      Kak Soky, Sheng Li, Masato Mimura, Chenhui Chu, Tatsuya Kawahara
    • Organizer
      INTERSPEECH 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Augmented Adversarial Self-Supervised Learning for Early-Stage Alzheimer's Speech Detection2022

    • Author(s)
      Longfei Yang, Wenqing Wei, Sheng Li, Jiyi Li, Takahiro Shinozaki
    • Organizer
      INTERSPEECH 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection2022

    • Author(s)
      Kai Li, Sheng Li, Xugang Lu, Masato Akagi, Meng Liu, Lin Zhang, Chang Zeng, Longbiao Wang, Jianwu Dang, Masashi Unoki
    • Organizer
      INTERSPEECH 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Fusion of Self-supervised Learned Models for MOS Prediction2022

    • Author(s)
      Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao
    • Organizer
      INTERSPEECH 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Monaural Speech Enhancement Based on Spectrogram Decomposition for Convolutional Neural Network-sensitive Feature Extraction2022

    • Author(s)
      Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara
    • Organizer
      INTERSPEECH 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Multi-Domain Dialogue State Tracking with Top-k Slot Self Attention2022

    • Author(s)
      Longfei Yang, Jiyi Li, Sheng Li, Takahiro Shinozaki
    • Organizer
      SIGdial Meeting Discourse \& Dialogue 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Nict-Tib1: A Public Speech Corpus Of Lhasa Dialect For Benchmarking Tibetan Language Speech Recognition Systems2022

    • Author(s)
      Kak Soky, Zhuo Gong, Sheng Li
    • Organizer
      25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Subband-based Spectrogram Fusion for Speech Enhancement by Combining Mapping and Masking Approaches2022

    • Author(s)
      Hao Shi, Longbiao Wang, Sheng Li, Jianwu Dang, Tatsuya Kawahara
    • Organizer
      Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Can We Train a Language Model Inside an End-to-End ASR Model? - Investigating Effective Implicit Language Modeling2022

    • Author(s)
      Zhuo Gong, Saito Daisuke, Sheng Li, Hisashi Kawai, Minematsu Nobuaki
    • Organizer
      Proceedings of the Second Workshop on When Creative AI Meets Conversational AI
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model2022

    • Author(s)
      Z. Gong, D. Saito, L. Yang, T. Shinozaki, S. Li, H. Kawai and N. Minematsu
    • Organizer
      ISCA-Odyssey (The Speaker and Language Recognition Workshop)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection2022

    • Author(s)
      S. Li, J. Li, Q. Liu and Z. Gong
    • Organizer
      LREC (Language Resources and Evaluation Conference)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation2022

    • Author(s)
      Y. Lv, L. Wang, M. Ge, S. Li, C. Ding, L. Pan, Y. Wang, J. Dang, K. Honda
    • Organizer
      in Proc. IEEE-ICASSP, pp. 7992--7996, 2022.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Mining Hard Samples Locally and Globally for Improved Speech Separation2022

    • Author(s)
      K. Wang, Y. Peng, H. Huang, Y. Hu, and S. Li
    • Organizer
      in Proc. IEEE-ICASSP, pp. 6037--6041, 2022.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] The System Description for VoiceMOS Challenge 2022 (KK team, main/ood tasks)2022

    • Author(s)
      S. Li, R. Dabre, R. Raphael, W. Zhou, Z. Yang, C. Chu, Y. Zhao
    • Organizer
      VoiceMOS Challenge 2022
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition2021

    • Author(s)
      H. Shi, L. Wang, S. Li, C. Fan, J. Dang, and T. Kawahara
    • Organizer
      In Proc. APSIPA ASC, pp. 438--442, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework2021

    • Author(s)
      Y. Peng, J. Zhang, H. Zhang, H. Xu, H. Huang, S. Li, and E.S. Chng
    • Organizer
      In Proc. APSIPA ASC, pp. 1043--1048, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora2021

    • Author(s)
      K. Soky, S. Li, M. Mimura, C. Chu, and T. Kawahara
    • Organizer
      In Proc. APSIPA ASC, pp. 433--437, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model2021

    • Author(s)
      D. Wang, S. Ye, X. Hu, S. Li, and X. Xu
    • Organizer
      in Proc. INTERSPEECH, pp. 3266--3270, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time- Frequency Domain2021

    • Author(s)
      K. Wang, H. Huang, Y. Hu, Z. Huang, and S. Li
    • Organizer
      in Proc. INTERSPEECH, pp. 3046--3050, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] The RoyalFlush-NICT System Description for AP21-OLR Challenge (Silk-road team, full tasks)2021

    • Author(s)
      D. Wang, S. Ye, X. Hu, S. Li
    • Organizer
      OLR2021 (oriental language recognition challenge)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] System description of Alzheimer's disease early detection (Silk-road team, short speech track)2021

    • Author(s)
      W. Wei, R. Wong, S. Li, Y. Guo and H. Huang
    • Organizer
      In special session of NCMMSC2021 (Alzheimer's disease detection challenge), 2021
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview2021

    • Author(s)
      X. Chen, H. Huang, and S. Li
    • Organizer
      National Conference on Man-Machine Speech Communication (NCMMSC), 2021. (report is selected to publish in Applied Sciences, Special Issues of Machine Speech Communication)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Speech Dereverberation Based on Scale-aware Mean Square Error Loss2021

    • Author(s)
      L. Qiang, H. Shi, M. Ge, H. Yin, N. Li, L. Wang, S. Li and J. Dang
    • Organizer
      International Conference on Neural Information Processing (ICONIP2021), pp 55-63, Springer, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Simultaneous Progressive Filtering-based Monaural Speech Enhancement2021

    • Author(s)
      H. Yin, L. Qiang, H. Shi, L. Wang, S. Li, M. Ge, G. Zhang and J. Dang
    • Organizer
      International Conference on Neural Information Processing (ICONIP2021), pp 213-221, Springer, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS2021

    • Author(s)
      D. Liu, L. Wang, S. Li, H. Li, C. Ding, J. Zhang and J. Dang
    • Organizer
      International Conference on Neural Information Processing (ICONIP2021), pp 110-118, Springer, 2021.
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Book] Voices of the Himalayas: Investigation of Speech Recognition Technology for the Tibetan Language2022

    • Author(s)
      Sheng Li
    • Total Pages
      112
    • Publisher
      NICT
    • ISBN
      9784904020289
    • Related Report
      2022 Annual Research Report
  • [Book] Phantom in the Opera: The Vulnerabilities of Speech-based Artificial Intelligence Systems2022

    • Author(s)
      Sheng Li
    • Total Pages
      110
    • Publisher
      NICT
    • ISBN
      9784904020265
    • Related Report
      2022 Annual Research Report
  • [Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。

    • URL

      https://www.nict.go.jp/outcome/journals/journals_2021_j.html

    • Related Report
      2021 Research-status Report
  • [Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。

    • URL

      https://www.nict.go.jp/outcome/proceedings/proceedings_2021_j.html

    • Related Report
      2021 Research-status Report
  • [Remarks] google scholar of Sheng Li

    • URL

      https://scholar.google.com/citations?user=zHAhs0IAAAAJ&hl=en

    • Related Report
      2021 Research-status Report
  • [Remarks] Lab homepage of Sheng Li

    • URL

      https://ast-astrec.nict.go.jp/member/sheng-li/index.html

    • Related Report
      2021 Research-status Report

URL: 

Published: 2021-04-28   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi