2021 Fiscal Year Research-status Report

Speech privacy protection by high-quality, invertible, and extendable speech anonymization and de-anonymization

Research Project

Project/Area Number	21K17775
Research Institution	National Institute of Informatics
Principal Investigator	Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
Project Period (FY)	2021-04-01 – 2024-03-31
Keywords	speech privacy / speaker anonymization / speech waveform modeling / neural network / deep learning
Outline of Annual Research Achievements	The first year's work on the speaker anonymization includes three part: Part 1) following the research plan, the flow-based invertible anonymization system was implemented, and experiments were conducted on the VoicePrivacy 2020 evaluation platforms. As expected, anonymized speech can be de-anonymized (i.e., inverted back to the original wavform), and the de-anonymized waveform were recognized by speaker verification system with similar accuracy to the original waveform. Word error rate was also similar. However, the anonymized speech still contained speaker information and performed worse than the baseline. Furthermore, the quality of anonymized speech was degraded. Thus, the 1st edition of the flow-based anonymization system needs improvement. Part 2) while not included the research plan, I was contributing to the VoicePrivacy 2022 challenge and building new baseline speaker anonymization models. These models are different from the flow-based model above, and they are combined from the neural waveform model (KAKENHI 19K24371) and latest general-adversarial-network-based approach for speech modeling. The baseline models are released for free (see https://www.voiceprivacychallenge.org). Part 3) A new language-independent speaker anonymization system was proposed and accepted to Odyssey 2022 workshop. Although this system is not designed to be reversible, its advantage is that the language-dependent speech recognizer is not required as the systems built in Part 2). Thus, it can be directly used to anonymize other languages such as Mandarin.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason As planned for the 1st year, the flow-based invertible anonymization model was implemented and testified. An input waveform can be anonymized and then de-anonymized. The de-anonymized waveform encodes the original speaker information and has high quality (i.e., low word error rate). Thus, the goal of invertibility was partially achieved. However, the anonymized speech has degraded quality, and it still contains much speaker information. In short, while the de-anonymization performance is satisfying, the anonymization processing is limited. Most of the efforts were paid to the organization of the VoicePrivacy Challenge 2022 (https://www.voiceprivacychallenge.org). Supported by this KAKEN project, new baseline models were built and released on GitHub for free access. Compared with the baseline models of the previous challenge, the new baseline models were based on a popular deep learning programming language called PyTorch, which makes it easier for users to digest and modify. Furthermore, the new baselines incorporate advanced general adversarial network (GAN)-based neural vocoders, and the anonymized audio quality was perpetually improved. Finally, the new language-independent speaker anonymization system was proposed. It uses a language-independent self-supervised speech model (SSL) to replace the language-dependent speech recognizer for speech content extraction. This is a new direction for speaker anonymization. The new paper was accepted to ISCA Speaker Odyssey 2022 workshop.
Strategy for Future Research Activity	The original research plans were: 1) 2nd year: anonymization of accent and other speaker-related information; 2) 3rd year: joint optimization of the speaker anonymization system with speech recognition system (ASR), speaker verification (ASV), and other components that recognize the speaker-related information. Based on the findings in the 1st year, we plan to focus on the language-independent anonymization framework in the 2nd year, following the accepted paper to Odyssey 2022 workshop. This new framework requires no language-dependent components (such as the ASR), and it is relatively easier to be extended to anonymize other speaker attributes such as accent and ethnicity. The 3rd year's plan was slightly revised because ASR is not necessary for the new language-independent speaker anonymization framework. Instead, it uses a self-supervised speech (SSL) model to extract speech content from the input speech waveform. Thus, joint optimization will be conducted on the SSL and the rest of the anonymization system.
Causes of Carryover	The budget to purchase the GPU card was not executed due to the global semiconductor shortage. However, we plan to purchase the aforementioned hardware or other CPU/GPU servers in the next fiscal year if possible. The budget for traveling to international conference was not executed because of the pandemic. However, we plan to attend international conferences in person from 2022 September as long as the situation becomes better.

Research Products
(13 results)

All 2022 2021 Other

All Int'l Joint Research (2 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results) Presentation (7 results) (of which Int'l Joint Research: 7 results, Invited: 1 results) Remarks (3 results)

[Int'l Joint Research] University of Avignon/EURECOM/Universite de Lorraine(フランス)
- Country Name
  FRANCE
- Counterpart Institution
  University of Avignon/EURECOM/Universite de Lorraine
- # of Other Institutions
  2
[Int'l Joint Research] Naver Corporation(韓国)
- Country Name
  KOREA (REP. OF KOREA)
- Counterpart Institution
  Naver Corporation
[Journal Article] The VoicePrivacy 2020 Challenge: Results and findings2022
- Author(s)
  Tomashenko Natalia、Wang Xin、Vincent Emmanuel、Patino Jose、Srivastava Brij Mohan Lal、No? Paul-Gauthier、Nautsch Andreas、Evans Nicholas、Yamagishi Junichi、O’Brien Benjamin、Chanclu Ana?s、Bonastre Jean-Fran?ois、Todisco Massimiliano、Maouche Mohamed
- Journal Title
  
  Computer Speech & Language
  
  Volume: 74 Pages: 101362～101362
- DOI
  10.1016/j.csl.2022.101362
- Peer Reviewed / Int'l Joint Research
[Presentation] Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models2022
- Author(s)
  Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko
- Organizer
  Proc. Odyssey 2022 The Speaker and Language Recognition Workshop
- Int'l Joint Research
[Presentation] Estimating the confidence of speech spoofing countermeasure2022
- Author(s)
  Wang Xin, Yamagishi Junichi
- Organizer
  ICASSP 2022
- Int'l Joint Research
[Presentation] Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances2022
- Author(s)
  Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
- Organizer
  ICASSP 2022
- Int'l Joint Research
[Presentation] Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation2022
- Author(s)
  Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans
- Organizer
  Proc. Odyssey 2022 The Speaker and Language Recognition Workshop
- Int'l Joint Research
[Presentation] Investigating self-supervised front ends for speech spoofing countermeasures2022
- Author(s)
  Xin Wang, Junichi Yamagishi
- Organizer
  Proc. Odyssey 2022 The Speaker and Language Recognition Workshop
- Int'l Joint Research
[Presentation] Benchmarking and challenges in security and privacy for voice biometrics2021
- Author(s)
  Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier NoE, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi
- Organizer
  2021 ISCA Symposium on Security and Privacy in Speech Communication
- Int'l Joint Research
[Presentation] Two speech security issues after the speech synthesis boom2021
- Author(s)
  Wang Xin
- Organizer
  Speech Synthesis Forum, China Computer Federation
- Int'l Joint Research / Invited
[Remarks] Official page of VoicePrivacy
- URL
  https://www.voiceprivacychallenge.org/
[Remarks] Open-source baseline of VoicePrivacy 2022
- URL
  https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022
[Remarks] Languange-independent speaker anonymization system
- URL
  https://github.com/nii-yamagishilab/SSL-SAS

2021 Fiscal Year Research-status Report

Speech privacy protection by high-quality, invertible, and extendable speech anonymization and de-anonymization

Principal Investigator

Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)

Current Status of Research Progress

Reason

Research Products

[Int'l Joint Research] University of Avignon/EURECOM/Universite de Lorraine(フランス)

Country Name

Counterpart Institution

# of Other Institutions

[Int'l Joint Research] Naver Corporation(韓国)

Country Name

Counterpart Institution

[Journal Article] The VoicePrivacy 2020 Challenge: Results and findings2022

Author(s)

Journal Title

DOI

[Presentation] Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models2022

Author(s)

Organizer

[Presentation] Estimating the confidence of speech spoofing countermeasure2022

Author(s)

Organizer

[Presentation] Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances2022

Author(s)

Organizer

[Presentation] Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation2022

Author(s)

Organizer

[Presentation] Investigating self-supervised front ends for speech spoofing countermeasures2022

Author(s)

Organizer

[Presentation] Benchmarking and challenges in security and privacy for voice biometrics2021

Author(s)

Organizer

[Presentation] Two speech security issues after the speech synthesis boom2021

Author(s)

Organizer

[Remarks] Official page of VoicePrivacy

URL

[Remarks] Open-source baseline of VoicePrivacy 2022

URL

[Remarks] Languange-independent speaker anonymization system

URL