2021 年度実施状況報告書

Speech privacy protection by high-quality, invertible, and extendable speech anonymization and de-anonymization

研究課題

研究課題/領域番号	21K17775
研究機関	国立情報学研究所
研究代表者	Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
研究期間 (年度)	2021-04-01 – 2024-03-31
キーワード	speech privacy / speaker anonymization / speech waveform modeling / neural network / deep learning
研究実績の概要	The first year's work on the speaker anonymization includes three part: Part 1) following the research plan, the flow-based invertible anonymization system was implemented, and experiments were conducted on the VoicePrivacy 2020 evaluation platforms. As expected, anonymized speech can be de-anonymized (i.e., inverted back to the original wavform), and the de-anonymized waveform were recognized by speaker verification system with similar accuracy to the original waveform. Word error rate was also similar. However, the anonymized speech still contained speaker information and performed worse than the baseline. Furthermore, the quality of anonymized speech was degraded. Thus, the 1st edition of the flow-based anonymization system needs improvement. Part 2) while not included the research plan, I was contributing to the VoicePrivacy 2022 challenge and building new baseline speaker anonymization models. These models are different from the flow-based model above, and they are combined from the neural waveform model (KAKENHI 19K24371) and latest general-adversarial-network-based approach for speech modeling. The baseline models are released for free (see https://www.voiceprivacychallenge.org). Part 3) A new language-independent speaker anonymization system was proposed and accepted to Odyssey 2022 workshop. Although this system is not designed to be reversible, its advantage is that the language-dependent speech recognizer is not required as the systems built in Part 2). Thus, it can be directly used to anonymize other languages such as Mandarin.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 As planned for the 1st year, the flow-based invertible anonymization model was implemented and testified. An input waveform can be anonymized and then de-anonymized. The de-anonymized waveform encodes the original speaker information and has high quality (i.e., low word error rate). Thus, the goal of invertibility was partially achieved. However, the anonymized speech has degraded quality, and it still contains much speaker information. In short, while the de-anonymization performance is satisfying, the anonymization processing is limited. Most of the efforts were paid to the organization of the VoicePrivacy Challenge 2022 (https://www.voiceprivacychallenge.org). Supported by this KAKEN project, new baseline models were built and released on GitHub for free access. Compared with the baseline models of the previous challenge, the new baseline models were based on a popular deep learning programming language called PyTorch, which makes it easier for users to digest and modify. Furthermore, the new baselines incorporate advanced general adversarial network (GAN)-based neural vocoders, and the anonymized audio quality was perpetually improved. Finally, the new language-independent speaker anonymization system was proposed. It uses a language-independent self-supervised speech model (SSL) to replace the language-dependent speech recognizer for speech content extraction. This is a new direction for speaker anonymization. The new paper was accepted to ISCA Speaker Odyssey 2022 workshop.
今後の研究の推進方策	The original research plans were: 1) 2nd year: anonymization of accent and other speaker-related information; 2) 3rd year: joint optimization of the speaker anonymization system with speech recognition system (ASR), speaker verification (ASV), and other components that recognize the speaker-related information. Based on the findings in the 1st year, we plan to focus on the language-independent anonymization framework in the 2nd year, following the accepted paper to Odyssey 2022 workshop. This new framework requires no language-dependent components (such as the ASR), and it is relatively easier to be extended to anonymize other speaker attributes such as accent and ethnicity. The 3rd year's plan was slightly revised because ASR is not necessary for the new language-independent speaker anonymization framework. Instead, it uses a self-supervised speech (SSL) model to extract speech content from the input speech waveform. Thus, joint optimization will be conducted on the SSL and the rest of the anonymization system.
次年度使用額が生じた理由	The budget to purchase the GPU card was not executed due to the global semiconductor shortage. However, we plan to purchase the aforementioned hardware or other CPU/GPU servers in the next fiscal year if possible. The budget for traveling to international conference was not executed because of the pandemic. However, we plan to attend international conferences in person from 2022 September as long as the situation becomes better.

研究成果
(13件)

すべて 2022 2021 その他

すべて国際共同研究 (2件) 雑誌論文 (1件) (うち国際共著 1件、査読あり 1件) 学会発表 (7件) (うち国際学会 7件、招待講演 1件) 備考 (3件)

[国際共同研究] University of Avignon/EURECOM/Universite de Lorraine(フランス)
- 国名
  フランス
- 外国機関名
  University of Avignon/EURECOM/Universite de Lorraine
- 他の機関数
  2
[国際共同研究] Naver Corporation(韓国)
- 国名
  韓国
- 外国機関名
  Naver Corporation
[雑誌論文] The VoicePrivacy 2020 Challenge: Results and findings2022
- 著者名/発表者名
  Tomashenko Natalia、Wang Xin、Vincent Emmanuel、Patino Jose、Srivastava Brij Mohan Lal、No? Paul-Gauthier、Nautsch Andreas、Evans Nicholas、Yamagishi Junichi、O’Brien Benjamin、Chanclu Ana?s、Bonastre Jean-Fran?ois、Todisco Massimiliano、Maouche Mohamed
- 雑誌名
  
  Computer Speech & Language
  
  巻: 74 ページ: 101362～101362
- DOI
  10.1016/j.csl.2022.101362
- 査読あり / 国際共著
[学会発表] Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models2022
- 著者名/発表者名
  Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Natalia Tomashenko
- 学会等名
  Proc. Odyssey 2022 The Speaker and Language Recognition Workshop
- 国際学会
[学会発表] Estimating the confidence of speech spoofing countermeasure2022
- 著者名/発表者名
  Wang Xin, Yamagishi Junichi
- 学会等名
  ICASSP 2022
- 国際学会
[学会発表] Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances2022
- 著者名/発表者名
  Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
- 学会等名
  ICASSP 2022
- 国際学会
[学会発表] Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation2022
- 著者名/発表者名
  Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans
- 学会等名
  Proc. Odyssey 2022 The Speaker and Language Recognition Workshop
- 国際学会
[学会発表] Investigating self-supervised front ends for speech spoofing countermeasures2022
- 著者名/発表者名
  Xin Wang, Junichi Yamagishi
- 学会等名
  Proc. Odyssey 2022 The Speaker and Language Recognition Workshop
- 国際学会
[学会発表] Benchmarking and challenges in security and privacy for voice biometrics2021
- 著者名/発表者名
  Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier NoE, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi
- 学会等名
  2021 ISCA Symposium on Security and Privacy in Speech Communication
- 国際学会
[学会発表] Two speech security issues after the speech synthesis boom2021
- 著者名/発表者名
  Wang Xin
- 学会等名
  Speech Synthesis Forum, China Computer Federation
- 国際学会 / 招待講演
[備考] Official page of VoicePrivacy
- URL
  https://www.voiceprivacychallenge.org/
[備考] Open-source baseline of VoicePrivacy 2022
- URL
  https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022
[備考] Languange-independent speaker anonymization system
- URL
  https://github.com/nii-yamagishilab/SSL-SAS

2021 年度 実施状況報告書

Speech privacy protection by high-quality, invertible, and extendable speech anonymization and de-anonymization

研究代表者

Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)

現在までの達成度 (区分)

理由

研究成果

[国際共同研究] University of Avignon/EURECOM/Universite de Lorraine(フランス)

国名

外国機関名

他の機関数

[国際共同研究] Naver Corporation(韓国)

国名

外国機関名

[雑誌論文] The VoicePrivacy 2020 Challenge: Results and findings2022

著者名/発表者名

雑誌名

DOI

[学会発表] Language-Independent Speaker Anonymization Approach Using Self-Supervised Pre-Trained Models2022

著者名/発表者名

学会等名

[学会発表] Estimating the confidence of speech spoofing countermeasure2022

著者名/発表者名

学会等名

[学会発表] Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances2022

著者名/発表者名

学会等名

[学会発表] Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation2022

著者名/発表者名

学会等名

[学会発表] Investigating self-supervised front ends for speech spoofing countermeasures2022

著者名/発表者名

学会等名

[学会発表] Benchmarking and challenges in security and privacy for voice biometrics2021

著者名/発表者名

学会等名

[学会発表] Two speech security issues after the speech synthesis boom2021

著者名/発表者名

学会等名

[備考] Official page of VoicePrivacy

URL

[備考] Open-source baseline of VoicePrivacy 2022

URL

[備考] Languange-independent speaker anonymization system

URL

2021 年度実施状況報告書