2019 Fiscal Year Annual Research Report

Robust voice cloning technologies in noisy environments and its applications

Research Project

Project/Area Number	17H04687
Research Institution	National Institute of Informatics
Principal Investigator	山岸順一国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)
Project Period (FY)	2017-04-01 – 2020-03-31
Keywords	音声合成 / デジタルクローン / ディープラーニング / 話者適応
Outline of Annual Research Achievements	話者適応は音声合成を応用した「声のデジタルクローン技術」である。本研究は、音声合成用途以外の劣環境で収録された音声を新たにデジタルクローンの対象とすべく、必要な要素技術を先駆的に生み出す。具体的には、耐雑音・耐反響性を向上させ、高価な音声収録器材を不要とする頑健な話者適応手法、及び、教師なし話者適応手法を実現することが目的である。これまでに、テキストデータが付随しない音声データからでも手軽に声のデジタルクローン出来るように、Multi-modal architectureという新たなニューラルネットワーク構造を提案し、これにより話者適応が音声のみからでも行えることを示した。また、合成音声の品質はボコーダという音響特徴量を音声波形信号に変換する技術により大きく制約されてしまうことから、このボコーダの改良も鋭意行った。Neural source-filter modelという新たなニューラル波形モデルを提案した。本年度は、劣環境で収録された音声を学習データとして利用した「話者エンコーダ」を組み込んだニューラルネットワーク音声合成方式を新たに提案し、教師なし話者適応を劣環境で収録した音声からでも実現できることを示した。また、合成音声の品質も向上させるため、ニューラル波形モデルの改良も鋭意行い、Neural Harmonic-plus-Noise Waveform Modelなど改良版を提案した。さらに、声のデジタルクローン技術を実データに対しても適用し、その有効性を検討すると同時に新たな応用例も模索した。具体的には、落語実演を収録した音声に対して、前述の話者エンコーダを組み込んだなニューラルネットワーク音声合成方式およびニューラル波形モデルを適用することで、噺家が使い分ける様々な役を再現する落語音声合成が実現できることを示した。
Research Progress Status	令和元年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和元年度が最終年度であるため、記入しない。

Research Products
(16 results)

All 2020 2019 Other

All Int'l Joint Research (3 results) Journal Article (9 results) (of which Int'l Joint Research: 3 results, Peer Reviewed: 9 results, Open Access: 9 results) Presentation (4 results) (of which Int'l Joint Research: 1 results, Invited: 2 results)

[Int'l Joint Research] National University of Singapore(シンガポール)
- Country Name
  SINGAPORE
- Counterpart Institution
  National University of Singapore
[Int'l Joint Research] Aalto university(フィンランド)
- Country Name
  FINLAND
- Counterpart Institution
  Aalto university
[Int'l Joint Research] MIT/JHU(米国)
- Country Name
  U.S.A.
- Counterpart Institution
  MIT/JHU
[Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020
- Author(s)
  X. Wang, S. Takaki and J. Yamagishi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 28 Pages: 402-415
- DOI
  https://doi.org/10.1109/TASLP.2019.2956145
- Peer Reviewed / Open Access
[Journal Article] Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings2020
- Author(s)
  Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
- Journal Title
  
  2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  
  Volume: - Pages: 6184-6188
- DOI
  https://doi.org/10.1109/ICASSP40776.2020.9054535
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment2020
- Author(s)
  Y. Yasuda, X. Wang and J. Yamagishi
- Journal Title
  
  2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  
  Volume: - Pages: 6724-6728
- DOI
  https://doi.org/10.1109/ICASSP40776.2020.9053546
- Peer Reviewed / Open Access
[Journal Article] Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet2019
- Author(s)
  Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
- Journal Title
  
  Proc. Interspeech 2019
  
  Volume: - Pages: 1298-1302
- DOI
  http://dx.doi.org/10.21437/Interspeech.2019-1357
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram2019
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
- Journal Title
  
  Proc. Interspeech 2019
  
  Volume: - Pages: 694-698
- DOI
  http://dx.doi.org/10.21437/Interspeech.2019-2008
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] l Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis2019
- Author(s)
  Xin Wang, Junichi Yamagishi
- Journal Title
  
  Proc. 10th ISCA Speech Synthesis Workshop
  
  Volume: - Pages: 1-6
- DOI
  http://dx.doi.org/10.21437/SSW.2019-1
- Peer Reviewed / Open Access
[Journal Article] Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments2019
- Author(s)
  Yusuke Yasuda, Xin Wang, Junichi Yamagishi
- Journal Title
  
  Proc. 10th ISCA Speech Synthesis Workshop
  
  Volume: - Pages: 211-216
- DOI
  http://dx.doi.org/10.21437/SSW.2019-38
- Peer Reviewed / Open Access
[Journal Article] Rakugo speech synthesis using segment-to-segment neural transduction and style tokens ― toward speech synthesis for entertaining audiences2019
- Author(s)
  Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  Proc. 10th ISCA Speech Synthesis Workshop
  
  Volume: - Pages: 111-116
- DOI
  http://dx.doi.org/10.21437/SSW.2019-20
- Peer Reviewed / Open Access
[Journal Article] Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech2019
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Journal Title
  
  2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  
  Volume: - Pages: 200-207
- DOI
  https://doi.org/10.1109/ASRU46091.2019.9004008
- Peer Reviewed / Open Access
[Presentation] 音声の個人性に関する多角的研究2019
- Author(s)
  山岸順一
- Organizer
  日本音響学会2019年秋季研究発表会
- Invited
[Presentation] 落語音声合成モデルの頑健な学習方法と発話様式の変動への対処2019
- Author(s)
  加藤集平, 安田裕介, Xin Wang, Erica Cooper, 高木信二, 山岸順一
- Organizer
  日本音響学会2019年秋季研究発表会
[Presentation] ソフトアテンションを使用しないエンドツーエンド音声合成の初期検討2019
- Author(s)
  安田裕介, 山岸順一, Xin Wang
- Organizer
  日本音響学会2019年秋季研究発表会
[Presentation] Speaker Identity Cloning and Protection2019
- Author(s)
  Junichi Yamagishi
- Organizer
  AFEKA SPEECH PROCESSING CONFERENCE 2019: 10-YEAR ANNIVERSARY CONFERENCE
- Int'l Joint Research / Invited

2019 Fiscal Year Annual Research Report

Robust voice cloning technologies in noisy environments and its applications

Principal Investigator

山岸 順一 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)

Research Products

[Int'l Joint Research] National University of Singapore(シンガポール)

Country Name

Counterpart Institution

[Int'l Joint Research] Aalto university(フィンランド)

Country Name

Counterpart Institution

[Int'l Joint Research] MIT/JHU(米国)

Country Name

Counterpart Institution

[Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020

Author(s)

Journal Title

DOI

[Journal Article] Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings2020

Author(s)

Journal Title

DOI

[Journal Article] Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment2020

Author(s)

Journal Title

DOI

[Journal Article] Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet2019

Author(s)

Journal Title

DOI

[Journal Article] GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram2019

Author(s)

Journal Title

DOI

[Journal Article] l Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis2019

Author(s)

Journal Title

DOI

[Journal Article] Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments2019

Author(s)

Journal Title

DOI

[Journal Article] Rakugo speech synthesis using segment-to-segment neural transduction and style tokens ― toward speech synthesis for entertaining audiences2019

Author(s)

Journal Title

DOI

[Journal Article] Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech2019

Author(s)

Journal Title

DOI

[Presentation] 音声の個人性に関する多角的研究2019

Author(s)

Organizer

[Presentation] 落語音声合成モデルの頑健な学習方法と発話様式の変動への対処2019

Author(s)

Organizer

[Presentation] ソフトアテンションを使用しないエンドツーエンド音声合成の初期検討2019

Author(s)

Organizer

[Presentation] Speaker Identity Cloning and Protection2019

Author(s)

Organizer

山岸順一国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)