• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2021 Fiscal Year Research-status Report

Phantom in the Opera: the Vulnerabilities of Speech Interface for Robotic Dialogue System

Research Project

Project/Area Number 21K17837
Research InstitutionNational Institute of Information and Communications Technology

Principal Investigator

李 勝  国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)

Project Period (FY) 2021-04-01 – 2023-03-31
Keywordsadversarial attacks / speech recognition / speech enhancement
Outline of Annual Research Achievements

Although COVID19, our project is fruitful and concrete as planned. We followed new powerful deep neural network-based models and new attack methods in the last two years. To protect the system from attacks, we are very interested in using existing technologies, e.g., speech enhancement or adaptation, to solve this problem. This year, my research focuses on investigating the potential of speech enhancement. Papers from Journals and top conferences have been accepted in our research. Next year, we will continue to focus on building concrete speech recognition systems with new popular models and attacking methods. Reliable and easy-implement methods, e.g., speech enhancement, will also be investigated to protect the system from adversarial attacks.

Current Status of Research Progress
Current Status of Research Progress

1: Research has progressed more than it was originally planned.

Reason

This year, the progress is as follows:
We construct speech recognition systems with recent popular training toolkits and neural network types (accepted in Journals and conferences, e.g., ICASSP2022)
We did surveys for the current attack methods. We implement robust adversarial attacks using the Kaldi-based ASR systems. We are also happy to see that this framework can be used to protect sensitive speech content (accepted in LREC2022).
To defend against attacks, we find that adversarial audios are very sensitive. Moreover, the feature of its spectrogram is very different from the human voice, and it can be treated as a special kind of noise. We construct speech enhancement systems and study their mechanism this year (accepted in Journals and conferences, e.g., ICASSP2022).

Strategy for Future Research Activity

Next year, we will continue to build concrete speech recognition systems with new popular models and attacking methods with state-of-the-art frameworks, e.g., transformer.
To defend against the attacks, we are very interested in using existing technologies, e.g., speech enhancement or adaptation, to solve this problem.
Papers from journals and conferences will be expected.

Causes of Carryover

Last year, because of COVID19, all international conferences and academic visiting were canceled. I did not spend the funding, and I mainly did online research activity.

This year, regarding business regularization, I will continue to limit business traveling. So, the funding will be spent on purchasing devices (e.g., spoken dialogue robot, database, musical instrument) and paper publication fees (e.g., books, conferences, and journal papers).

  • Research Products

    (24 results)

All 2022 2021 Other

All Int'l Joint Research (2 results) Journal Article (2 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 2 results,  Open Access: 2 results) Presentation (16 results) (of which Int'l Joint Research: 16 results) Remarks (4 results)

  • [Int'l Joint Research] Tianjin University/Xinjiang University/Royal Flush AI Research Inc.(中国)

    • Country Name
      CHINA
    • Counterpart Institution
      Tianjin University/Xinjiang University/Royal Flush AI Research Inc.
  • [Int'l Joint Research] Nanyang Technological University(シンガポール)

    • Country Name
      SINGAPORE
    • Counterpart Institution
      Nanyang Technological University
  • [Journal Article] Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling2022

    • Author(s)
      Qin Siqing、Wang Longbiao、Li Sheng、Dang Jianwu、Pan Lixin
    • Journal Title

      EURASIP Journal on Audio, Speech, and Music Processing

      Volume: - Pages: 1-10

    • DOI

      10.1186/s13636-021-00233-4

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Adversarial Attack and Defense on Deep Neural Network-Based Voice Processing Systems: An Overview2021

    • Author(s)
      Chen Xiaojiao、Li Sheng、Huang Hao
    • Journal Title

      Applied Sciences

      Volume: 11 Pages: 8450-8450

    • DOI

      10.3390/app11188450

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Self-Adaptive Multilingual ASR Rescoring with Language Identification and Unified Language Model2022

    • Author(s)
      Z. Gong, D. Saito, L. Yang, T. Shinozaki, S. Li, H. Kawai and N. Minematsu
    • Organizer
      ISCA-Odyssey (The Speaker and Language Recognition Workshop)
    • Int'l Joint Research
  • [Presentation] Adversarial Speech Generation and Natural Speech Recovery for Speech Content Protection2022

    • Author(s)
      S. Li, J. Li, Q. Liu and Z. Gong
    • Organizer
      LREC (Language Resources and Evaluation Conference)
    • Int'l Joint Research
  • [Presentation] Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation2022

    • Author(s)
      Y. Lv, L. Wang, M. Ge, S. Li, C. Ding, L. Pan, Y. Wang, J. Dang, K. Honda
    • Organizer
      in Proc. IEEE-ICASSP, pp. 7992--7996, 2022.
    • Int'l Joint Research
  • [Presentation] Mining Hard Samples Locally and Globally for Improved Speech Separation2022

    • Author(s)
      K. Wang, Y. Peng, H. Huang, Y. Hu, and S. Li
    • Organizer
      in Proc. IEEE-ICASSP, pp. 6037--6041, 2022.
    • Int'l Joint Research
  • [Presentation] The System Description for VoiceMOS Challenge 2022 (KK team, main/ood tasks)2022

    • Author(s)
      S. Li, R. Dabre, R. Raphael, W. Zhou, Z. Yang, C. Chu, Y. Zhao
    • Organizer
      VoiceMOS Challenge 2022
    • Int'l Joint Research
  • [Presentation] Spectrograms Fusion-based End-to-End Robust Automatic Speech Recognition2021

    • Author(s)
      H. Shi, L. Wang, S. Li, C. Fan, J. Dang, and T. Kawahara
    • Organizer
      In Proc. APSIPA ASC, pp. 438--442, 2021.
    • Int'l Joint Research
  • [Presentation] Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework2021

    • Author(s)
      Y. Peng, J. Zhang, H. Zhang, H. Xu, H. Huang, S. Li, and E.S. Chng
    • Organizer
      In Proc. APSIPA ASC, pp. 1043--1048, 2021.
    • Int'l Joint Research
  • [Presentation] On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora2021

    • Author(s)
      K. Soky, S. Li, M. Mimura, C. Chu, and T. Kawahara
    • Organizer
      In Proc. APSIPA ASC, pp. 433--437, 2021.
    • Int'l Joint Research
  • [Presentation] An End-to-End Dialect Identification System with Transfer Learning from a Multilingual Automatic Speech Recognition Model2021

    • Author(s)
      D. Wang, S. Ye, X. Hu, S. Li, and X. Xu
    • Organizer
      in Proc. INTERSPEECH, pp. 3266--3270, 2021.
    • Int'l Joint Research
  • [Presentation] End-to-End Speech Separation Using Orthogonal Representation in Complex and Real Time- Frequency Domain2021

    • Author(s)
      K. Wang, H. Huang, Y. Hu, Z. Huang, and S. Li
    • Organizer
      in Proc. INTERSPEECH, pp. 3046--3050, 2021.
    • Int'l Joint Research
  • [Presentation] The RoyalFlush-NICT System Description for AP21-OLR Challenge (Silk-road team, full tasks)2021

    • Author(s)
      D. Wang, S. Ye, X. Hu, S. Li
    • Organizer
      OLR2021 (oriental language recognition challenge)
    • Int'l Joint Research
  • [Presentation] System description of Alzheimer's disease early detection (Silk-road team, short speech track)2021

    • Author(s)
      W. Wei, R. Wong, S. Li, Y. Guo and H. Huang
    • Organizer
      In special session of NCMMSC2021 (Alzheimer's disease detection challenge), 2021
    • Int'l Joint Research
  • [Presentation] Adversarial Attack and Defense on Deep Neural Network-based Voice Processing Systems: An Overview2021

    • Author(s)
      X. Chen, H. Huang, and S. Li
    • Organizer
      National Conference on Man-Machine Speech Communication (NCMMSC), 2021. (report is selected to publish in Applied Sciences, Special Issues of Machine Speech Communication)
    • Int'l Joint Research
  • [Presentation] Speech Dereverberation Based on Scale-aware Mean Square Error Loss2021

    • Author(s)
      L. Qiang, H. Shi, M. Ge, H. Yin, N. Li, L. Wang, S. Li and J. Dang
    • Organizer
      International Conference on Neural Information Processing (ICONIP2021), pp 55-63, Springer, 2021.
    • Int'l Joint Research
  • [Presentation] Simultaneous Progressive Filtering-based Monaural Speech Enhancement2021

    • Author(s)
      H. Yin, L. Qiang, H. Shi, L. Wang, S. Li, M. Ge, G. Zhang and J. Dang
    • Organizer
      International Conference on Neural Information Processing (ICONIP2021), pp 213-221, Springer, 2021.
    • Int'l Joint Research
  • [Presentation] Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS2021

    • Author(s)
      D. Liu, L. Wang, S. Li, H. Li, C. Ding, J. Zhang and J. Dang
    • Organizer
      International Conference on Neural Information Processing (ICONIP2021), pp 110-118, Springer, 2021.
    • Int'l Joint Research
  • [Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。

    • URL

      https://www.nict.go.jp/outcome/journals/journals_2021_j.html

  • [Remarks] 情報通信研究機構の研究成果として、各年ごとの発表論文を日付順で紹介します。

    • URL

      https://www.nict.go.jp/outcome/proceedings/proceedings_2021_j.html

  • [Remarks] google scholar of Sheng Li

    • URL

      https://scholar.google.com/citations?user=zHAhs0IAAAAJ&hl=en

  • [Remarks] Lab homepage of Sheng Li

    • URL

      https://ast-astrec.nict.go.jp/member/sheng-li/index.html

URL: 

Published: 2022-12-28  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi