Project/Area Number |
21K17837
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 61050:Intelligent robotics-related
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
Li Sheng 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 研究員 (70840940)
|
Project Period (FY) |
2021-04-01 – 2023-03-31
|
Project Status |
Completed (Fiscal Year 2022)
|
Budget Amount *help |
¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Fiscal Year 2022: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
|
Keywords | speech recognition / adversarial attack / privacy perserving / deepfake detection / spoken dialogue / federated learning / security / privacy preserving / quality estimation / spoken dialogue system / adversarial attacks / speech enhancement / Speech recognition / Dialogue robotic system / Adversarial attack / Deep neural network |
Outline of Research at the Start |
As the most natural human-machine interface, the automatic speech recognition (ASR) module plays a crucial role in these recent robot dialogue systems. However, a deep neural network (DNN) is known to be vulnerable to adversarial examples (or attacks). This is a severe problem. This study will make an in-depth study to the robustness of the ASR modules of a robot dialogue system.
|
Outline of Final Research Achievements |
In this project, we carefully studied the principles of speech recognition systems and researched all possible attack details. We summarized our findings in a review and proposed methods for improving the front-end and back-end of speech recognition systems. We expanded our research scope with a universal point of view. Similar attacks can co-exist in speech-related systems, not just speech recognition systems. We also consider adversarial attacks as particular noise, then combining traditional speech enhancement, modeling, and post-processing methods in system development can sufficiently deal with this attack. Top journals and conferences in the speech field accepted our achievements, such as Interspeech and ICASSP. Above two years of research achievement have been introduced into two books (ISBN: 978-4-904020-26-5, ISBN: 978-4-904020-28-9) by NICT and stored in the national library Kansai. These efforts are our contribution to ensuring the security and reliability of AI systems.
|
Academic Significance and Societal Importance of the Research Achievements |
The development of deep neural networks has been progressing rapidly and the evolution of speech recognition systems has been incredibly fast. The study aims to provide researchers with ideas on improving system security in light of the increasingly severe security issues.
|