2019 Fiscal Year Annual Research Report

Research on auditory-media signal processing for defending against attacks of media clones

Research Project

Project/Area Number	17H01761
Research Institution	Japan Advanced Institute of Science and Technology
Principal Investigator	鵜木祐史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (00343187)
Co-Investigator(Kenkyū-buntansha)	赤木正人北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (20242571)
Project Period (FY)	2017-04-01 – 2021-03-31
Keywords	聴覚メディア信号処理 / メディアクローン / 音声合成技術 / 聴覚センシング / 音響情報ハイディング / 音響電子透かし
Outline of Annual Research Achievements	実世界の真正データから限りなく本物に近いものとして人工的に作られたメディアは「メディアクローン」と呼ばれる．近年，このメディアクローンが実世界やサイバー空間で流通され，社会的脅威となりつつある．特に，音声合成技術を駆使して産み出された本物そっくりな音声は，「なりすまし」や「改ざん」に悪用され，音声認証システムを突破するなど重大な社会問題を巻き起こしつつある．本研究の目的は，音信号のメディアクローン攻撃に対して，適切な防御策（なりすましや改ざんの検出）を実現するための聴覚メディア信号処理の基盤技術を確立することである．本年度は，前年度に引き続き，音声のメディアクローン生成・認識法を深く理解し，具体的な攻撃方法を模擬検討した．ここでは，Variational Autoencoder (VAE)を利用して，音声の話者性と発話内容を切り分けて制御可能な音声変換法を提案した．音声変換の総合評価の結果，音声の自然性にまだ若干の問題が残るものの，提案法が，GMM法やonehot-VAE法といった従来法よりも，音声の話者性と発話内容の変換に対して有効であることがわかった．次に，音声改ざん等を防ぐための情報ハイディング法として，線形予測法（LP）をベースとした音声分析合成系におけるスペクトル拡散型音声情報ハイディング法とロバスト主成分分析法とフォルマント強調方を組み合わせた音声情報ハイディング法を実現した．また，両方法を利用したハイブリッド型の音声改ざん検出法も検討した．これらの方法が，典型的な改ざん攻撃に対して耐性があることを確認した．最後に，メディアクローンで利用される重要な音響的特徴を検討した．ここでは聴覚マスキング特性を加味したSpikegramの他に，人が発した音声なのか，あるいはそのメディアクローンなのか，音声の子音部分や無音区間における音響的特徴の差異を検討した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 研究期間では，音声のメディアクローンの生成・認識法を深く理解した上で，次の三つの課題に取り組むことを計画していた．(1) 音声のメディアクローンを作成する際，何が最も重要な音響的特徴であるかを明らかにする．(2) ヒトが発する音と機械が発する音の音響的特徴の差異を明らかにする．(3) 音響特徴に知覚不可能で頑健な秘匿情報を埋め込む仕組むをつくる．当該年度では，昨年度から継続して，これら三つの課題に取り組んだ．課題(1)では，代表的な機械学習（Variational Autoencoder）を利用して，音声の発話内容と話者性を切り分けて音声変換可能な方法を実現した．また，この声質変換法を利用して，なりすましの一つとして音声変換が可能であることも確認した．次に課題(2)では，まだ検討中の部分が多いが，人が発する音声の子音にみられる音響特徴がメディアクローンの判別に利用可能であることを突き止めた．しかし，この音響特徴は背景音に埋もれると検出が難しくなるなどの課題を残したため，今後，ロバスト検出法を検討する予定である．最後に課題(3)では，音声情報で重要な特徴（音源や声道フィルタ特性）や聴覚的特徴（マスキング特性や知覚的スペクトル形状）に秘匿情報を埋め込む仕組みを検討した．以上から，当初の計画通りに実施できていると判断できる．
Strategy for Future Research Activity	令和元年度の取り組みから，ヒトが発する声（音声だけでなく口唇や舌，鼻腔からでる人体雑音）と機械が発する音（例えば，AD/DA変換から生じる微小な定常的雑音やジッター・シマーのようなヒトには知覚できない位相変化をもつ音）の音響的特徴に違いがあることがわかった．特に，音声の子音（破裂音）によるバズ音特有の特徴（バズバーなど）が音声のメディアクローンでは見られないなど，いくつかの重要な発見があった．令和２年度は，これらの特徴を利用した音声電子指紋を検討したのち，音響的特徴に知覚不可能で頑健な秘匿情報の埋め込み・検出を可能とする聴覚的な情報ハイディング法を確立する．最後に，最終年度に向け，メディアクローンの検出のための，図（目的音）の秘匿情報と地（背景音）の音響電子指紋の検出方法について検討していくことで，研究の大きな前進を目指す．
Remarks	Construction of auditory media signal processing infrastructure to prevent media clone attacks

Research Products
(14 results)

All 2020 2019 Other

All Journal Article (7 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 7 results, Open Access: 5 results) Presentation (6 results) (of which Int'l Joint Research: 3 results, Invited: 1 results) Remarks (1 results)

[Journal Article] Non-Blind Speech Watermarking Method Based on Spread-Spectrum Using Linear Prediction Residue2020
- Author(s)
  Reiya Namikawa and Masashi Unoki
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E103.D Pages: 63～66
- DOI
  https://doi.org/10.1587/transinf.2019MUL0003
- Peer Reviewed / Open Access
[Journal Article] 聴覚特性に基づいた音響情報ハイディング技術2020
- Author(s)
  鵜木祐史
- Journal Title
  
  IEICE ESS Fundamentals Review
  
  Volume: 13 Pages: 284～293
- DOI
  https://doi.org/10.1587/essfr.13.4_284
- Peer Reviewed / Open Access
[Journal Article] Speech Watermarking Based on Source-filter Model of Speech Production2019
- Author(s)
  Shengbei Wang, Weitao Yuan, Jianming Wang, and Masashi Unoki
- Journal Title
  
  Journal of Information Hiding and Multimedia Signal Processing
  
  Volume: 10(14) Pages: 517-534
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Detection of speech tampering using sparse representations and spectral manipulations based information hiding2019
- Author(s)
  Shengbei, Wang, Weitao, Yuan, Jianming, Wang, and Masashi Unoki
- Journal Title
  
  Speech Communication
  
  Volume: 112 Pages: 1～14
- DOI
  https://doi.org/10.1016/j.specom.2019.06.004
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Feasibility of Audio Information Hiding Using Linear Time Variant IIR Filters Based on Cochlear Delay2019
- Author(s)
  Candy Olivia Mawalim and Masashi Unoki
- Journal Title
  
  Journal of Signal Processing
  
  Volume: 23 Pages: 155～158
- DOI
  https://doi.org/10.2299/jsp.23.155
- Peer Reviewed / Open Access
[Journal Article] Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network2019
- Author(s)
  Boxin He, Shengbei Wang, Weitao Yuan, Jianming Wang, and Masashi Masashi
- Journal Title
  
  Proc. ICME2019
  
  Volume: - Pages: 1354-1359
- DOI
  10.1109/ICME.2019.00235
- Peer Reviewed / Int'l Joint Research
[Journal Article] Inaudible Speech Watermarking Based on Self-compensated Echo-hiding and Sparse Subspace Clustering2019
- Author(s)
  Shengbei Wang, Weitao Yuan, Jianming Wang, and Masashi Unoki
- Journal Title
  
  Proc. ICASSP2019
  
  Volume: - Pages: 2632-2636
- DOI
  10.1109/ICASSP.2019.8682352
- Peer Reviewed / Int'l Joint Research
[Presentation] Speech communication with affective speech-to-speech translation2019
- Author(s)
  Masato Akagi
- Organizer
  National Conference on Man-Machine Speech Communication (NCMMSC2019)
- Int'l Joint Research / Invited
[Presentation] Audio Information Hiding based on Cochlear Delay Characteristics with Optimized Segment Selection2019
- Author(s)
  Candy Olivia Mawalim and Masashi Unoki
- Organizer
  2019 3rd International Conference on Security with Intelligent Computing and Big-data Services (SICBS 2019)
- Int'l Joint Research
[Presentation] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder2019
- Author(s)
  Tuan Vu Ho and Masato Akagi
- Organizer
  APSIPA2019
- Int'l Joint Research
[Presentation] Cochlear delay based audio information hiding with segment selection optimization2019
- Author(s)
  Candy Olivia Mawalim and Masashi Unoki
- Organizer
  電子情報通信学会EMM研究会，東北大学
[Presentation] Study on cochlear-delay based audio information hiding by linear time-variant IIR filter2019
- Author(s)
  Candy Olivia Mawalim and Masashi Unoki
- Organizer
  日本音響学会2019年度秋季研究発表会，立命館大学
[Presentation] 蝸牛遅延に基づいた線形時変IIRフィルタによる音響情報ハイディング2019
- Author(s)
  キャンディオリフィアマワリム，鵜木祐史
- Organizer
  2019年度電気・情報関係学会北陸支部連合大会, 石川高専
[Remarks] Science Impact
- URL
  https://www.ingentaconnect.com/content/sil/impact/2020/00002020/00000002/art00008

2019 Fiscal Year Annual Research Report

Research on auditory-media signal processing for defending against attacks of media clones

Principal Investigator

鵜木 祐史 北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (00343187)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Non-Blind Speech Watermarking Method Based on Spread-Spectrum Using Linear Prediction Residue2020

Author(s)

Journal Title

DOI

[Journal Article] 聴覚特性に基づいた音響情報ハイディング技術2020

Author(s)

Journal Title

DOI

[Journal Article] Speech Watermarking Based on Source-filter Model of Speech Production2019

Author(s)

Journal Title

[Journal Article] Detection of speech tampering using sparse representations and spectral manipulations based information hiding2019

Author(s)

Journal Title

DOI

[Journal Article] Feasibility of Audio Information Hiding Using Linear Time Variant IIR Filters Based on Cochlear Delay2019

Author(s)

Journal Title

DOI

[Journal Article] Data Augmentation for Monaural Singing Voice Separation Based on Variational Autoencoder-Generative Adversarial Network2019

Author(s)

Journal Title

DOI

[Journal Article] Inaudible Speech Watermarking Based on Self-compensated Echo-hiding and Sparse Subspace Clustering2019

Author(s)

Journal Title

DOI

[Presentation] Speech communication with affective speech-to-speech translation2019

Author(s)

Organizer

[Presentation] Audio Information Hiding based on Cochlear Delay Characteristics with Optimized Segment Selection2019

Author(s)

Organizer

[Presentation] Non-parallel Voice Conversion with Controllable Speaker Individuality using Variational Autoencoder2019

Author(s)

Organizer

[Presentation] Cochlear delay based audio information hiding with segment selection optimization2019

Author(s)

Organizer

[Presentation] Study on cochlear-delay based audio information hiding by linear time-variant IIR filter2019

Author(s)

Organizer

[Presentation] 蝸牛遅延に基づいた線形時変IIRフィルタによる音響情報ハイディング2019

Author(s)

Organizer

[Remarks] Science Impact

URL

鵜木祐史北陸先端科学技術大学院大学, 先端科学技術研究科, 教授 (00343187)