Robust voice cloning technologies in noisy environments and its applications

Research Project

Project/Area Number	17H04687
Research Category	Grant-in-Aid for Young Scientists (A)
Allocation Type	Single-year Grants
Research Field	Perceptual information processing
Research Institution	National Institute of Informatics
Principal Investigator	Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)
Project Period (FY)	2017-04-01 – 2020-03-31
Project Status	Completed (Fiscal Year 2019)
Budget Amount *help	¥21,710,000 (Direct Cost: ¥16,700,000、Indirect Cost: ¥5,010,000) Fiscal Year 2019: ¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2018: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000) Fiscal Year 2017: ¥9,620,000 (Direct Cost: ¥7,400,000、Indirect Cost: ¥2,220,000)
Keywords	音声情報処理 / 音声合成 / 深層学習 / 話者適応 / 音声強調 / デジタルクローン / ディープラーニング
Outline of Final Research Achievements	The "digital voice cloning technology" is an application of speaker adaptation in speech synthesis. In this study, we proposed new elemental technologies and constructed a database for speech recorded in inferior environments other than speech synthesis. First of all, a parallel database "DR-VCTK" was constructed in which low quality voice and original high quality voice are paired. We proposed a new neural network structure called "Multi-modal architecture" to enable the digital cloning of voices from voice data without text data. In addition, a new neural network structure incorporating a speaker encoder that uses speech recorded in a poor environment as training data is proposed, and it is shown that unsupervised speaker adaptation can be performed even from low-quality speech recorded in a non-ideal environment.
Academic Significance and Societal Importance of the Research Achievements	音声合成用音響モデルの学習は、通常、スタジオ収録した高品質音声のみを対象にする。それゆえ、雑音・反響音を含む音声もしくは低品質収録器材により収録された音声に基づき音声合成を行うことは容易ではなく、研究理論に至っては全く構築されていないと言って良い状況であった。本研究は既存技術の制約を取り払い、劣悪条件や正解ラベルがないと言った環境においても、声のデジタルクローンを可能にするした。それゆえ、音声合成および話者適応技術を理論的により熟成させたという学術的意義を持つ。また、音声合成および話者適応技術の応用先が爆発的に増えると予想され、社会的意義も大きい。

Report

(4 results)

2019 Annual Research Report Final Research Report ( PDF )
2018 Annual Research Report
2017 Annual Research Report

Research Products
(76 results)

All 2020 2019 2018 2017 Other

All Int'l Joint Research (11 results) Journal Article (39 results) (of which Int'l Joint Research: 20 results, Peer Reviewed: 39 results, Open Access: 37 results) Presentation (26 results) (of which Int'l Joint Research: 11 results, Invited: 3 results)

[Int'l Joint Research] National University of Singapore(シンガポール)
- Related Report
  2019 Annual Research Report
[Int'l Joint Research] Aalto university(フィンランド)
- Related Report
  2019 Annual Research Report
[Int'l Joint Research] MIT/JHU(米国)
- Related Report
  2019 Annual Research Report
[Int'l Joint Research] Aalto university(フィンランド)
- Related Report
  2018 Annual Research Report
[Int'l Joint Research] Polytechnic University of Catalonia(スペイン)
- Related Report
  2018 Annual Research Report
[Int'l Joint Research] エジンバラ大学(英国)
- Related Report
  2017 Annual Research Report
[Int'l Joint Research] アルト大学/東フィンランド大学(フィンランド)
- Related Report
  2017 Annual Research Report
[Int'l Joint Research] Oben(米国)
- Related Report
  2017 Annual Research Report
[Int'l Joint Research] 中国科学技術大学(中国)
- Related Report
  2017 Annual Research Report
[Int'l Joint Research] Austrian Academy of Sciences/Austrian Research Institute for AI/University of Applied Sciences(オーストリア)
- Related Report
  2017 Annual Research Report
[Int'l Joint Research]
- Related Report
  2017 Annual Research Report
[Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020
- Author(s)
  Wang Xin、Takaki Shinji、Yamagishi Junichi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 28 Pages: 402-415
- DOI
  10.1109/taslp.2019.2956145
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings2020
- Author(s)
  Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
- Journal Title
  
  ICASSP 2020
  
  Volume: - Pages: 6184-6188
- DOI
  10.1109/icassp40776.2020.9054535
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment2020
- Author(s)
  Y. Yasuda, X. Wang and J. Yamagishi
- Journal Title
  
  2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  
  Volume: - Pages: 6724-6728
- DOI
  10.1109/icassp40776.2020.9053546
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet2019
- Author(s)
  Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
- Journal Title
  
  Proc. Interspeech 2019
  
  Volume: - Pages: 1298-1302
- DOI
  10.21437/interspeech.2019-1357
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram2019
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
- Journal Title
  
  Proc. Interspeech 2019
  
  Volume: - Pages: 694-698
- DOI
  10.21437/interspeech.2019-2008
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis2019
- Author(s)
  Wang Xin、Yamagishi Junichi
- Journal Title
  
  Proceeding of Speech Synthesis Workshop
  
  Volume: - Pages: 1-6
- DOI
  10.21437/ssw.2019-1
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments2019
- Author(s)
  Yusuke Yasuda, Xin Wang, Junichi Yamagishi
- Journal Title
  
  Proc. 10th ISCA Speech Synthesis Workshop
  
  Volume: - Pages: 211-216
- DOI
  10.21437/ssw.2019-38
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Rakugo speech synthesis using segment-to-segment neural transduction and style tokens ― toward speech synthesis for entertaining audiences2019
- Author(s)
  Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  Proc. 10th ISCA Speech Synthesis Workshop
  
  Volume: - Pages: 111-116
- DOI
  10.21437/ssw.2019-20
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech2019
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Journal Title
  
  2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  
  Volume: - Pages: 200-207
- DOI
  10.1109/asru46091.2019.9004008
- Related Report
  2019 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra2019
- Author(s)
  Nakashika Toru、Takaki Shinji、Yamagishi Junichi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 27 Issue: 2 Pages: 244-254
- DOI
  10.1109/taslp.2018.2877465
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] STFT spectral loss for training a neural speech waveform model2019
- Author(s)
  Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
- Journal Title
  
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Neural source-filter-based waveform model for statistical parametric speech synthesis2019
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language2019
- Author(s)
  Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks2019
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
- Journal Title
  
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics2019
- Author(s)
  Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
- Journal Title
  
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion2019
- Author(s)
  Shreyas Seshadri, Lauri Juvela, Junichi Yamagishi, Okko Rasanen, Paavo Alku
- Journal Title
  
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: -
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis2018
- Author(s)
  Airaksinen Manu, Juvela Lauri, Bollepalli Bajibabu, Junichi Yamagishi, Alku Paavo,
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 26(9) Issue: 9 Pages: 1658-1670
- DOI
  10.1109/taslp.2018.2835720
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Expressive Speech Synthesis Using Sentiment Embeddings2018
- Author(s)
  Igor Jauk, Jaime Lorenzo-Trueba, Junichi Yamagishi, Antonio Bonafonte
- Journal Title
  
  Proc. Interspeech 2018
  
  Volume: - Pages: 3062-3066
- DOI
  10.21437/interspeech.2018-2467
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Speaker-independent Raw Waveform Model for Glottal Excitation2018
- Author(s)
  Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
- Journal Title
  
  Proc. Interspeech 2018
  
  Volume: - Pages: 2012-2016
- DOI
  10.21437/interspeech.2018-1635
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation2018
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Journal Title
  
  Proc. Interspeech 2018
  
  Volume: - Pages: 2494-2498
- DOI
  10.21437/interspeech.2018-1791
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Wasserstein GAN and waveform loss-based acoustic model training for multi-speaker text-to-speech synthesis systems using a WaveNet neural vocoder2018
- Author(s)
  Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, and Nobuaki Minematsu
- Journal Title
  
  IEEE Access
  
  Volume: 6 Pages: 60478-60488
- DOI
  10.1109/access.2018.2872060
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems2018
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Journal Title
  
  2018 IEEE Spoken Language Technology Workshop (SLT)
  
  Volume: - Pages: 610-617
- DOI
  10.1109/slt.2018.8639659
- Related Report
  2018 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech2018
- Author(s)
  Cassia Valentini-Botinhao, Junichi Yamagishi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 8 Issue: 8 Pages: 1420-1433
- DOI
  10.1109/taslp.2018.2828980
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Autoregressive neural F0 model for statistical parametric speech synthesis2018
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 8 Issue: 8 Pages: 1406-1419
- DOI
  10.1109/taslp.2018.2828650
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment2018
- Author(s)
  Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio and Zhenhua Ling
- Journal Title
  
  Speaker Odyssey 2018
  
  Volume: －
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods2018
- Author(s)
  Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen and Zhenhua Ling
- Journal Title
  
  Speaker Odyssey 2018
  
  Volume: －
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama’s voice using GAN, WaveNet and low-quality found data2018
- Author(s)
  Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi and Tomi Kinnunen
- Journal Title
  
  Speaker Odyssey 2018
  
  Volume: －
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK2018
- Author(s)
  Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba
- Journal Title
  
  2018 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: － Pages: 5279-5283
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] CYBORG SPEECH: DEEP MULTILINGUAL SPEECH SYNTHESIS FOR GENERATING SEGMENTAL FOREIGN ACCENT WITH NATURAL PROSODY2018
- Author(s)
  Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, Junichi Yamagishi
- Journal Title
  
  2018 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: － Pages: 4799-4803
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] SPEECH WAVEFORM SYNTHESIS FROM MFCC SEQUENCES WITH GENERATIVE ADVERSARIAL NETWORKS2018
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
- Journal Title
  
  2018 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: － Pages: 5679-5683
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS2018
- Author(s)
  A COMPARISON OF RECENT WAVEFORM GENERATION AND ACOUSTIC MODELING METHODS FOR NEURAL-NETWORK-BASED SPEECH SYNTHESIS
- Journal Title
  
  2018 IEEE International Conference on Acoustics, Speech and Signal Processing
  
  Volume: － Pages: 4804-4808
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Investigating very deep highway networks for parametric speech synthesis2018
- Author(s)
  Wang Xin、Takaki Shinji、Yamagishi Junichi
- Journal Title
  
  Speech Communication
  
  Volume: 96 Pages: 1-9
- DOI
  10.1016/j.specom.2017.11.002
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices2017
- Author(s)
  Michael Pucher, Bettina Zillinger, Markus Toman, Junichi Yamagishi, Erich Schmid, Cassia Valentini-Botinhao, Dietmar Schabus, Thomas Woltron
- Journal Title
  
  Computer Speech & Language
  
  Volume: 46 Pages: 179-195
- DOI
  10.1016/j.csl.2017.05.010
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] An RNN-based Quantized F0 Model with Multi-tier Feedback Links forText-to-Speech Synthesis2017
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  Interspeech 2017
  
  Volume: － Pages: 1059-1063
- DOI
  10.21437/interspeech.2017-246
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Direct modeling of frequency spectra and waveform generationbased on phase recovery for DNN-based speech synthesis2017
- Author(s)
  Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
- Journal Title
  
  Interspeech 2017
  
  Volume: － Pages: 1128-1132
- DOI
  10.21437/interspeech.2017-488
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Complex-valued restricted Boltzmann machine for direct learning of frequency spectra2017
- Author(s)
  Toru Nakashika, Shinji Takaki, Junichi Yamagishi
- Journal Title
  
  Interspeech 2017
  
  Volume: － Pages: 4021-4025
- DOI
  10.21437/interspeech.2017-584
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system2017
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
- Journal Title
  
  Interspeech 2017
  
  Volume: － Pages: 1368-1372
- DOI
  10.21437/interspeech.2017-848
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Generative Adversarial Network-based Postfilter for STFT Spectrograms2017
- Author(s)
  Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
- Journal Title
  
  Interspeech 2017
  
  Volume: － Pages: 3389-3393
- DOI
  10.21437/interspeech.2017-962
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Learning word vector representations based on acoustic counts2017
- Author(s)
  M. Sam Ribeiro, Oliver Watts, Junichi Yamagishi
- Journal Title
  
  Interspeech 2017
  
  Volume: － Pages: 799-803
- DOI
  10.21437/interspeech.2017-1340
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] 音声の個人性に関する多角的研究2019
- Author(s)
  山岸順一
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
- Invited
[Presentation] 落語音声合成モデルの頑健な学習方法と発話様式の変動への対処2019
- Author(s)
  加藤集平, 安田裕介, Xin Wang, Erica Cooper, 高木信二, 山岸順一
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] ソフトアテンションを使用しないエンドツーエンド音声合成の初期検討2019
- Author(s)
  安田裕介, 山岸順一, Xin Wang
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Annual Research Report
[Presentation] Speaker Identity Cloning and Protection2019
- Author(s)
  Junichi Yamagishi
- Organizer
  AFEKA SPEECH PROCESSING CONFERENCE 2019: 10-YEAR ANNIVERSARY CONFERENCE
- Related Report
  2019 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems2019
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Organizer
  2018 IEEE Spoken Language Technology Workshop (SLT)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] STFT spectral loss for training a neural speech waveform model2019
- Author(s)
  Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
- Organizer
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Neural source-filter-based waveform model for statistical parametric speech synthesis2019
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Organizer
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language2019
- Author(s)
  Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
- Organizer
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks2019
- Author(s)
  Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
- Organizer
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics2019
- Author(s)
  Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
- Organizer
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion2019
- Author(s)
  Shreyas Seshadri, Lauri Juvela, Junichi Yamagishi, Okko Rasanen, Paavo Alku
- Organizer
  2019 IEEE International Conference on Acoustics, Speech and Signal Processing
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Expressive Speech Synthesis Using Sentiment Embeddings2018
- Author(s)
  Igor Jauk, Jaime Lorenzo-Trueba, Junichi Yamagishi, Antonio Bonafonte
- Organizer
  Interspeech 2018
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Speaker-independent Raw Waveform Model for Glottal Excitation2018
- Author(s)
  Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
- Organizer
  Interspeech 2018
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation2018
- Author(s)
  Hieu-Thi Luong, Junichi Yamagishi
- Organizer
  Interspeech 2018
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Investigation of WaveNet for Text-to-Speech Synthesis2018
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Organizer
  情報処理学会第120回音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] Stealing your vocal identity from the internet: cloning Obama's voice from found data using GAN and Wavenet2018
- Author(s)
  Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi
- Organizer
  情報処理学会第120回音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] Generating segment-level foreign-accented synthetic speech with natural speech prosody2018
- Author(s)
  Gustav Henter, Jaime Lorenzo-Trueba, Xin Wang, Kondo Mariko, Junichi Yamagishi
- Organizer
  情報処理学会第120回音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] リカレント構造を持つ複素制限ボルツマンマシンによる複素スペクトル系列モデリング2018
- Author(s)
  中鹿亘, 高木信二, 山岸順一
- Organizer
  情報処理学会第120回音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] CycleGANを用いたクロスリンガル声質変換2018
- Author(s)
  房福明, Jaime Lorenzo- Trueba, 山岸順一, 越前功
- Organizer
  情報処理学会第120回音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] CycleGANを用いた高品質なノンパラレル声質変換2017
- Author(s)
  房福明, 山岸順一, 越前功
- Organizer
  第19回音声言語シンポジウム
- Related Report
  2017 Annual Research Report
[Presentation] Analyzing the impact of including listener perception annotations in RNN-based emotional speech synthesis2017
- Author(s)
  Jaime Lorenzo-Trueba・Gustav Henter・Shinji Takaki・Junichi Yamagishi
- Organizer
  第19回音声言語シンポジウム
- Related Report
  2017 Annual Research Report
[Presentation] ASVspoof: 話者照合における生体検知2017
- Author(s)
  山岸順一
- Organizer
  第7回バイオメトリクスと認識・認証シンポジウム
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] 複素RBMを用いた音声スペクトルモデリングの改良と評価2017
- Author(s)
  中鹿亘，高木信二，山岸順一
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] Autoregressive quantized F0 modeling using a recurrent neural network with feedback links2017
- Author(s)
  Xin Wang, Shinji Takaki, Junichi Yamagishi
- Organizer
  2017年8月音声研究会
- Related Report
  2017 Annual Research Report
[Presentation] 複素RBM：制限ボルツマンマシンの複素数拡張と音声信号への応用と評価2017
- Author(s)
  中鹿亘，高木信二，山岸順一
- Organizer
  第117回音声言語情報処理研究会
- Related Report
  2017 Annual Research Report
[Presentation] 敵対的学習に基づくSTFTスペクトログラムのポストフィルタリング2017
- Author(s)
  金子卓弘, 高木信二, 亀岡弘和, 山岸順一
- Organizer
  2017年6月度音声研究会
- Related Report
  2017 Annual Research Report

Robust voice cloning technologies in noisy environments and its applications

Principal Investigator

Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)

¥21,710,000 (Direct Cost: ¥16,700,000、Indirect Cost: ¥5,010,000)

Report

Research Products

[Int'l Joint Research] National University of Singapore(シンガポール)

Related Report

[Int'l Joint Research] Aalto university(フィンランド)

Related Report

[Int'l Joint Research] MIT/JHU(米国)

Related Report

[Int'l Joint Research] Aalto university(フィンランド)

Related Report

[Int'l Joint Research] Polytechnic University of Catalonia(スペイン)

Related Report

[Int'l Joint Research] エジンバラ大学(英国)

Related Report

[Int'l Joint Research] アルト大学/東フィンランド大学(フィンランド)

Related Report

[Int'l Joint Research] Oben(米国)

Related Report

[Int'l Joint Research] 中国科学技術大学(中国)

Related Report

[Int'l Joint Research] Austrian Academy of Sciences/Austrian Research Institute for AI/University of Applied Sciences(オーストリア)

Related Report

[Int'l Joint Research]

Related Report

[Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Zero-Shot Multi-Speaker Text-To-Speech with State-Of-The-Art Neural Speaker Embeddings2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Effect of Choice of Probability Distribution, Randomness, and Search Methods for Alignment Modeling in Sequence-to-Sequence Text-to-Speech Synthesis Using Hard Alignment2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Joint Training Framework for Text-to-Speech and Voice Conversion Using Multi-Source Tacotron and WaveNet2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-Spectrogram2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Rakugo speech synthesis using segment-to-segment neural transduction and style tokens ― toward speech synthesis for entertaining audiences2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Bootstrapping Non-Parallel Voice Conversion from Speaker-Adaptive Text-to-Speech2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] STFT spectral loss for training a neural speech waveform model2019

Author(s)