2021 Fiscal Year Research-status Report
Speech privacy protection by high-quality, invertible, and extendable speech anonymization and de-anonymization
Project/Area Number |
21K17775
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Keywords | speech privacy / speaker anonymization / speech waveform modeling / neural network / deep learning |
Outline of Annual Research Achievements |
The first year's work on the speaker anonymization includes three part: Part 1) following the research plan, the flow-based invertible anonymization system was implemented, and experiments were conducted on the VoicePrivacy 2020 evaluation platforms. As expected, anonymized speech can be de-anonymized (i.e., inverted back to the original wavform), and the de-anonymized waveform were recognized by speaker verification system with similar accuracy to the original waveform. Word error rate was also similar. However, the anonymized speech still contained speaker information and performed worse than the baseline. Furthermore, the quality of anonymized speech was degraded. Thus, the 1st edition of the flow-based anonymization system needs improvement.
Part 2) while not included the research plan, I was contributing to the VoicePrivacy 2022 challenge and building new baseline speaker anonymization models. These models are different from the flow-based model above, and they are combined from the neural waveform model (KAKENHI 19K24371) and latest general-adversarial-network-based approach for speech modeling. The baseline models are released for free (see https://www.voiceprivacychallenge.org).
Part 3) A new language-independent speaker anonymization system was proposed and accepted to Odyssey 2022 workshop. Although this system is not designed to be reversible, its advantage is that the language-dependent speech recognizer is not required as the systems built in Part 2). Thus, it can be directly used to anonymize other languages such as Mandarin.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
As planned for the 1st year, the flow-based invertible anonymization model was implemented and testified. An input waveform can be anonymized and then de-anonymized. The de-anonymized waveform encodes the original speaker information and has high quality (i.e., low word error rate). Thus, the goal of invertibility was partially achieved. However, the anonymized speech has degraded quality, and it still contains much speaker information. In short, while the de-anonymization performance is satisfying, the anonymization processing is limited.
Most of the efforts were paid to the organization of the VoicePrivacy Challenge 2022 (https://www.voiceprivacychallenge.org). Supported by this KAKEN project, new baseline models were built and released on GitHub for free access. Compared with the baseline models of the previous challenge, the new baseline models were based on a popular deep learning programming language called PyTorch, which makes it easier for users to digest and modify. Furthermore, the new baselines incorporate advanced general adversarial network (GAN)-based neural vocoders, and the anonymized audio quality was perpetually improved.
Finally, the new language-independent speaker anonymization system was proposed. It uses a language-independent self-supervised speech model (SSL) to replace the language-dependent speech recognizer for speech content extraction. This is a new direction for speaker anonymization. The new paper was accepted to ISCA Speaker Odyssey 2022 workshop.
|
Strategy for Future Research Activity |
The original research plans were: 1) 2nd year: anonymization of accent and other speaker-related information; 2) 3rd year: joint optimization of the speaker anonymization system with speech recognition system (ASR), speaker verification (ASV), and other components that recognize the speaker-related information.
Based on the findings in the 1st year, we plan to focus on the language-independent anonymization framework in the 2nd year, following the accepted paper to Odyssey 2022 workshop. This new framework requires no language-dependent components (such as the ASR), and it is relatively easier to be extended to anonymize other speaker attributes such as accent and ethnicity.
The 3rd year's plan was slightly revised because ASR is not necessary for the new language-independent speaker anonymization framework. Instead, it uses a self-supervised speech (SSL) model to extract speech content from the input speech waveform. Thus, joint optimization will be conducted on the SSL and the rest of the anonymization system.
|
Causes of Carryover |
The budget to purchase the GPU card was not executed due to the global semiconductor shortage. However, we plan to purchase the aforementioned hardware or other CPU/GPU servers in the next fiscal year if possible.
The budget for traveling to international conference was not executed because of the pandemic. However, we plan to attend international conferences in person from 2022 September as long as the situation becomes better.
|
Research Products
(13 results)
-
-
-
[Journal Article] The VoicePrivacy 2020 Challenge: Results and findings2022
Author(s)
Tomashenko Natalia、Wang Xin、Vincent Emmanuel、Patino Jose、Srivastava Brij Mohan Lal、No? Paul-Gauthier、Nautsch Andreas、Evans Nicholas、Yamagishi Junichi、O’Brien Benjamin、Chanclu Ana?s、Bonastre Jean-Fran?ois、Todisco Massimiliano、Maouche Mohamed
-
Journal Title
Computer Speech & Language
Volume: 74
Pages: 101362~101362
DOI
Peer Reviewed / Int'l Joint Research
-
-
-
-
-
-
[Presentation] Benchmarking and challenges in security and privacy for voice biometrics2021
Author(s)
Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier NoE, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi
Organizer
2021 ISCA Symposium on Security and Privacy in Speech Communication
Int'l Joint Research
-
-
-
-