2018 Fiscal Year Annual Research Report

PRISM: Speech privacy preservation based on selecting masking

Research Project

Project/Area Number	18H04112
Research Institution	Nagoya Institute of Technology
Principal Investigator	徳田恵一名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)
Co-Investigator(Kenkyū-buntansha)	山岸順一国立情報学研究所, コンテンツ科学研究系, 准教授 (70709352) 南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497) 橋本佳名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10635907)
Project Period (FY)	2018-04-01 – 2022-03-31
Keywords	音声情報処理 / 音声プライバシー / 音声合成
Outline of Annual Research Achievements	本研究では、音声に含まれるプライバシー情報を分離・変換可能な形で音声をモデル化することで、音声モデリングによる統一的な枠組みによって選択的にプライバシー情報を保護可能にする、次世代音声プライバシー保護技術 (PRISM: PRIvacy Selecting Masking)を確立することを目的とする。実空間における音声プライバシー保護技術の確立については、主に個人性と発話内容を対象とし、これらの情報を隠蔽するマスキング信号の生成技術について検討した。利用者の合成音声を用いたマスキング信号の生成に取り組み、主観評価実験により、利用者の声質の合成音声を用いることで、ホワイトノイズやピンクノイズと比較してマスキング効果が高いことが示された。さらに、他人の声質の合成音声と比較しても、利用者の声質の合成音声を用いることで、より高いマスキング効果が得られることが示された。収録済みの音声データを対象としたプライバシー保護技術(「サイバースペースにおける音声プライバシー保護技術」)については、話者コード、感情コード、フレーズコードなどを入力に用いたディープニューラルネットワークに基づく音響モデルによって感情等の要因を分離・変換可能にする音声モデリング技術の検討を行った。また、プライバシー情報を含む単語を音声の特徴を反映したノイズに変換するVQ-VAEに基づくプライバシー保護技術の検討を行った。さらに、neural source filterモデルを提案し、neural source filterモデルに基づく話者匿名化を実現した。提案法によって話者照合の等価エラー率は1%から最大34%となり、音声の品質を保ったまま、声の個人認証システムに対する匿名化をある程度行うことが可能になった。これらの結果を今後国際会議にて発表する予定である。
Current Status of Research Progress	Current Status of Research Progress 1: Research has progressed more than it was originally planned. Reason ここまでに実空間における音声プライバシー保護技術、サイバースペースにおける音声プライバシー保護技術について多方面から研究を進めており、基礎研究も含め、多くの成果を挙げている。このため、当初の計画以上に進展していると言える。
Strategy for Future Research Activity	本研究では、収録済みの音声データを対象としたプライバシー保護技術(「サイバースペースにおける音声プライバシー保護技術」)と実空間における音声を対象としたプライバシー保護技術(「実空間における音声プライバシー保護技術」)の2つの技術の構築に取り組む。サイバースペースにおける音声プライバシー保護技術に関しては、ここまでの知見を利用しながら複数のプライバシー情報を選択的に分離・変換する音声モデリング技術の検討を行う。また、neural source filterモデルに基づく話者匿名化技術において、話者照合システムの等価エラー率が増大するとともに、音声認識システムの単語誤り率が増加することがわかった。これは、話者匿名化処理に伴い、音声の話者性だけでなく音韻も同時に変わってしまっている可能性を示している。そこで、音韻性と話者性をより分離可能で、独立に制御可能なネットワーク構造について検討する。実空間における音声プライバシー保護技術については、主に個人性と発話内容を対象とし、これらの情報を隠蔽するマスキング信号の生成技術について検討を進めてきた。今後は、利用者の立場でのプライバシー感の評価やマスキング信号により周囲に迷惑をかけていると感じるかなどについて評価を行い、より高度なマスキング信号生成技術の研究に取り組む。

Research Products
(26 results)

All 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (25 results) (of which Int'l Joint Research: 15 results)

[Journal Article] Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Neural Vocoder2018
- Author(s)
  Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
- Journal Title
  
  IEEE Access
  
  Volume: 6 Pages: 60478-60488
- DOI
  10.1109/ACCESS.2018.2872060
- Peer Reviewed / Open Access
[Presentation] Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics2019
- Author(s)
  Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks2019
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language2019
- Author(s)
  Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Neural source-filter-based waveform model for statistical parametric speech synthesis2019
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] STFT spectral loss for training a neural speech waveform model2019
- Author(s)
  Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] 隠れセミマルコフモデルの構造を用いたDNNに基づく音声合成における計算量削減手法の検討2019
- Author(s)
  島田基樹, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
[Presentation] 周期・非周期信号から駆動するディープニューラルネットワークに基づく音声ボコーダ2019
- Author(s)
  藤本崇人, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
[Presentation] 敵対的生成ネットワークを用いた歌声合成の検討2019
- Author(s)
  法野行哉, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
[Presentation] DNNに基づく感情音声合成のための敵対的学習の検討2019
- Author(s)
  角谷健太, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2019年春季研究発表会
[Presentation] Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos2019
- Author(s)
  Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Singing voice synthesis based on generative adversarial networks2019
- Author(s)
  Yukiya Hono, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Singing Voice Conversion Using Posted Waveform Data on Music Social Media2018
- Author(s)
  Koki Senda, Yukiya Hono, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Int'l Joint Research
[Presentation] Recent Development of the DNN-based Singing Voice Synthesis System -- Sinsy2018
- Author(s)
  Yukiya Hono, Shumma Murata, Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Int'l Joint Research
[Presentation] Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition2018
- Author(s)
  Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Int'l Joint Research
[Presentation] Speaker Adaptation for Speech Synthesis Based on Deep Neural Networks Using Hidden Semi-Markov Model Structures2018
- Author(s)
  Kento Nakao, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Int'l Joint Research
[Presentation] Discriminative feature extraction based on sequential variational autoencoder for speaker recognition2018
- Author(s)
  Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2018)
- Int'l Joint Research
[Presentation] The NITech text-to-speech system for the Blizzard Challenge 20182018
- Author(s)
  Kei Sawada, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
- Organizer
  Blizzard Challenge 2018 Workshop
- Int'l Joint Research
[Presentation] 時間構造を考慮したニューラルネットワークに基づく音声合成における話者適応の検討2018
- Author(s)
  中尾健人, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  音声研究会
[Presentation] 話者認識のためのSequential VAEに基づく特徴抽出の検討2018
- Author(s)
  吉村建慶, 小池なつみ, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] 公共空間におけるスマートデバイスを用いた音声入力のためのサウンドマスキングに関する検討2018
- Author(s)
  次井貴浩, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] 周期・非周期成分の分離に基づくWaveNetボコーダを用いた音声合成2018
- Author(s)
  藤本崇人, 吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] Deep Neural Networkに基づく歌声合成システム -- Sinsy2018
- Author(s)
  法野行哉, 村田舜馬, 中村和寛, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] Blizzard Challenge 2018のためのNITechテキスト音声合成システム2018
- Author(s)
  沢田慶, 吉村建慶, 橋本佳, 大浦圭一郎, 南角吉彦, 徳田恵一
- Organizer
  日本音響学会2018年秋季研究発表会
[Presentation] Transforming acoustic characteristics to deceive playback spoofing countermeasures of speaker verification systems2018
- Author(s)
  Fuming Fang, Junichi Yamagishi, Isao Echizen, Md Sahidullah, Tomi Kinnunen
- Organizer
  WIFS2018: IEEE International Workshop on Information Forensics and Security
- Int'l Joint Research
[Presentation] Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems2018
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Organizer
  2018 IEEE Workshop on Spoken Language Technology (SLT 2018)
- Int'l Joint Research

2018 Fiscal Year Annual Research Report

PRISM: Speech privacy preservation based on selecting masking

Principal Investigator

徳田 恵一 名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Neural Vocoder2018

Author(s)

Journal Title

DOI

[Presentation] Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics2019

Author(s)

Organizer

[Presentation] Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks2019

Author(s)

Organizer

[Presentation] Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language2019

Author(s)

Organizer

[Presentation] Neural source-filter-based waveform model for statistical parametric speech synthesis2019

Author(s)

Organizer

[Presentation] STFT spectral loss for training a neural speech waveform model2019

Author(s)

Organizer

[Presentation] 隠れセミマルコフモデルの構造を用いたDNNに基づく音声合成における計算量削減手法の検討2019

Author(s)

Organizer

[Presentation] 周期・非周期信号から駆動するディープニューラルネットワークに基づく音声ボコーダ2019

Author(s)

Organizer

[Presentation] 敵対的生成ネットワークを用いた歌声合成の検討2019

Author(s)

Organizer

[Presentation] DNNに基づく感情音声合成のための敵対的学習の検討2019

Author(s)

Organizer

[Presentation] Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos2019

Author(s)

Organizer

[Presentation] Singing voice synthesis based on generative adversarial networks2019

Author(s)

Organizer

[Presentation] Singing Voice Conversion Using Posted Waveform Data on Music Social Media2018

Author(s)

Organizer

[Presentation] Recent Development of the DNN-based Singing Voice Synthesis System -- Sinsy2018

Author(s)

Organizer

[Presentation] Speech Synthesis Using WaveNet Vocoder Based on Periodic/Aperiodic Decomposition2018

Author(s)

Organizer

[Presentation] Speaker Adaptation for Speech Synthesis Based on Deep Neural Networks Using Hidden Semi-Markov Model Structures2018

Author(s)

Organizer

[Presentation] Discriminative feature extraction based on sequential variational autoencoder for speaker recognition2018

Author(s)

Organizer

[Presentation] The NITech text-to-speech system for the Blizzard Challenge 20182018

Author(s)

Organizer

[Presentation] 時間構造を考慮したニューラルネットワークに基づく音声合成における話者適応の検討2018

Author(s)

Organizer

[Presentation] 話者認識のためのSequential VAEに基づく特徴抽出の検討2018

Author(s)

Organizer

[Presentation] 公共空間におけるスマートデバイスを用いた音声入力のためのサウンドマスキングに関する検討2018

Author(s)

Organizer

[Presentation] 周期・非周期成分の分離に基づくWaveNetボコーダを用いた音声合成2018

Author(s)

Organizer

[Presentation] Deep Neural Networkに基づく歌声合成システム -- Sinsy2018

Author(s)

Organizer

[Presentation] Blizzard Challenge 2018のためのNITechテキスト音声合成システム2018

Author(s)

Organizer

徳田恵一名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)