• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

High-quality speech synthesis based on automatically-retrieved speech constraints

Research Project

Project/Area Number 16H06681
Research Category

Grant-in-Aid for Research Activity Start-up

Allocation TypeSingle-year Grants
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

Takamichi Shinnosuke  東京大学, 大学院情報理工学系研究科, 助教 (90784330)

Project Period (FY) 2016-08-26 – 2018-03-31
Project Status Completed (Fiscal Year 2017)
Budget Amount *help
¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2016: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords音声合成 / アンチ・スプーフィング / 深層学習 / 話者認証 / 音声なりすまし / anti-spoofing / 音声処理 / 音声変換 / 機械学習
Outline of Final Research Achievements

A method for speech synthesis incorporating generative adversarial networks (GANs) is proposed. One of the issues causing the quality degradation of speech synthesis is an oversmoothing effect often observed in the generated speech parameters. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters. Since the objective of the GANs is to minimize the divergence (i.e., distribution difference) between the natural and generated speech parameters, the proposed method effectively alleviates the oversmoothing effect on the generated speech parameters. We evaluated the effectiveness and found that 1) the proposed method can generate more natural spectral parameters regardless of its hyperparameter settings. 2) a Wasserstein GAN minimizing the Earth-Mover’s distance works the best in terms of improving the synthetic speech quality, and 3) the method can be extended to the vocoder-free speech synthesis.

Report

(3 results)
  • 2017 Annual Research Report   Final Research Report ( PDF )
  • 2016 Annual Research Report
  • Research Products

    (28 results)

All 2018 2017 2016 Other

All Journal Article (3 results) (of which Peer Reviewed: 3 results,  Open Access: 3 results) Presentation (23 results) (of which Int'l Joint Research: 6 results,  Invited: 1 results) Remarks (2 results)

  • [Journal Article] Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks2018

    • Author(s)
      Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processin

      Volume: 26 Issue: 1 Pages: 84-96

    • DOI

      10.1109/taslp.2017.2761547

    • Related Report
      2017 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Voice Conversion Using Input-to-Output Highway Networks2017

    • Author(s)
      Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E100.D Issue: 8 Pages: 1925-1928

    • DOI

      10.1587/transinf.2017EDL8034

    • NAID

      130005876129

    • ISSN
      0916-8532, 1745-1361
    • Related Report
      2017 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Voice conversion using input-to-output highway networks2017

    • Author(s)
      Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: Vol.E100-D

    • NAID

      130005876129

    • Related Report
      2016 Annual Research Report
    • Peer Reviewed / Open Access
  • [Presentation] Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks2018

    • Author(s)
      Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
    • Organizer
      IEEE ICASSP
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors2018

    • Author(s)
      Yuki Saito, Yusuke Ijima, Kyosuke Nishida, Shinnosuke Takamichi
    • Organizer
      IEEE ICASSP
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 多重周波数解像度のSTFTスペクトルを用いた敵対的DNN音声合成2018

    • Author(s)
      齋藤 佑樹, 高道 慎之介, 猿渡 洋
    • Organizer
      日本音響学会2018年春季研究発表会
    • Related Report
      2017 Annual Research Report
  • [Presentation] 高品質声質変換のための特徴量分析再訪2018

    • Author(s)
      須田 仁志, 小谷 岳, 高道 慎之介, 齋藤 大輔
    • Organizer
      日本音響学会2018年春季研究発表会
    • Related Report
      2017 Annual Research Report
  • [Presentation] 雑音環境下音声を用いたDNN音声合成のための雑音生成モデルの敵対的学習2018

    • Author(s)
      宇根 昌和, 齋藤 佑樹, 高道 慎之介, 北村 大地, 宮崎 亮一, 猿渡 洋
    • Organizer
      日本音響学会2018年春季研究発表会
    • Related Report
      2017 Annual Research Report
  • [Presentation] GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応2017

    • Author(s)
      高道 慎之介
    • Organizer
      日本音響学会2017年春季研究発表会
    • Place of Presentation
      明治大学生田キャンパス(神奈川県)
    • Year and Date
      2017-03-15
    • Related Report
      2016 Annual Research Report
  • [Presentation] Moment matching networkを用いた音声パラメータのランダム生成の検討2017

    • Author(s)
      高道 慎之介
    • Organizer
      日本音響学会2017年春季研究発表会
    • Place of Presentation
      明治大学生田キャンパス(神奈川県)
    • Year and Date
      2017-03-15
    • Related Report
      2016 Annual Research Report
  • [Presentation] コンテキスト事後確率のSequence-to-Sequence学習を用いた音声変換2017

    • Author(s)
      三好 裕之
    • Organizer
      日本音響学会2017年春季研究発表会
    • Place of Presentation
      明治大学生田キャンパス(神奈川県)
    • Year and Date
      2017-03-15
    • Related Report
      2016 Annual Research Report
  • [Presentation] 敵対的DNN音声合成におけるF0・継続長の生成2017

    • Author(s)
      齋藤 佑樹
    • Organizer
      日本音響学会2017年春季研究発表会
    • Place of Presentation
      明治大学生田キャンパス(神奈川県)
    • Year and Date
      2017-03-15
    • Related Report
      2016 Annual Research Report
  • [Presentation] Highway networkを用いた差分スペクトル法に基づく敵対的DNN音声変換2017

    • Author(s)
      齋藤 佑樹
    • Organizer
      日本音響学会2017年春季研究発表会
    • Place of Presentation
      明治大学生田キャンパス(神奈川県)
    • Year and Date
      2017-03-15
    • Related Report
      2016 Annual Research Report
  • [Presentation] Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis2017

    • Author(s)
      Yuki Saito
    • Organizer
      IEEE ICASSP
    • Place of Presentation
      New Orleans, USA
    • Year and Date
      2017-03-05
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] DNNテキスト音声合成のための Anti-spoofing に敵対する学習アルゴリズム2017

    • Author(s)
      齋藤 佑樹
    • Organizer
      情報処理学会
    • Place of Presentation
      琴平グランドホテル桜の抄(香川県)
    • Year and Date
      2017-02-17
    • Related Report
      2016 Annual Research Report
  • [Presentation] Modulation spectrum-based speech parameter trajectory smoothing for DNN-based speech synthesis using FFT spectra2017

    • Author(s)
      Shinnosuke Takamichi
    • Organizer
      APSIPA ASC
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Voice Conversion Using Sequence-to-Sequence Learning of Context Posterior Probabilities2017

    • Author(s)
      Hiroyuki Miyoshi, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
    • Organizer
      INTERSPEECH
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Sampling-based speech parameter generation using moment-matching network2017

    • Author(s)
      Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
    • Organizer
      INTERSPEECH
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 音素事後確率とd-vectorを用いたVariational Autoencoderによるノンパラレル多対多音声変換2017

    • Author(s)
      齋藤 佑樹, 井島 勇祐, 西田 京介, 高道 慎之介
    • Organizer
      電子情報通信学会 音声研究会
    • Related Report
      2017 Annual Research Report
  • [Presentation] 雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習2017

    • Author(s)
      宇根 昌和, 齋藤 佑樹, 高道 慎之介, 北村 大地, 宮崎 亮一, 猿渡 洋
    • Organizer
      情報処理学会 音声言語情報処理研究会
    • Related Report
      2017 Annual Research Report
  • [Presentation] コンテキスト事後確率のSequence-to-Sequence学習を用いた音声変換とDual Learningの評価2017

    • Author(s)
      三好 裕之, 齋藤 佑樹, 高道 慎之介, 猿渡 洋
    • Organizer
      電子情報通信学会 音声研究会
    • Related Report
      2017 Annual Research Report
  • [Presentation] "Moment-matching networkに基づく音声合成における音声パラメータのランダム生成2017

    • Author(s)
      高道 慎之介, 郡山 知樹, 猿渡 洋
    • Organizer
      情報処理学会 音楽情報科学研究会
    • Related Report
      2017 Annual Research Report
  • [Presentation] Moment-matching networkに基づく一期一会音声合成における発話間ゆらぎの評価2017

    • Author(s)
      高道 慎之介, 郡山 知樹, 齋藤 佑樹, 猿渡 洋
    • Organizer
      日本音響学会2017年秋季研究発表会
    • Related Report
      2017 Annual Research Report
  • [Presentation] 敵対的DNN音声合成におけるダイバージェンスの影響の調査2017

    • Author(s)
      齋藤 佑樹, 高道 慎之介, 猿渡 洋
    • Organizer
      日本音響学会2017年秋季研究発表会
    • Related Report
      2017 Annual Research Report
  • [Presentation] Anti-spoofingに敵対するDNN音声変換の評価2017

    • Author(s)
      齋藤 佑樹
    • Organizer
      電子情報通信学会2017年春季研究発表会
    • Place of Presentation
      東京大学本郷キャンパス(東京都)
    • Related Report
      2016 Annual Research Report
  • [Presentation] DNN 音声合成のための Anti-Spoofing を考慮した学習アルゴリズム2016

    • Author(s)
      齋藤 佑樹
    • Organizer
      日本音響学会2016年秋季研究発表会
    • Place of Presentation
      明治大学生田キャンパス(神奈川県)
    • Year and Date
      2016-09-14
    • Related Report
      2016 Annual Research Report
  • [Remarks] Adversarial DNN-Based Text-To-Speech Synthesis

    • URL

      http://sython.org/demo/icassp2017advtts/demo.html

    • Related Report
      2016 Annual Research Report
  • [Remarks] Adversarial DNN-Based Voice Conversion

    • URL

      http://sython.org/demo/sp201701advvc/demo.html

    • Related Report
      2016 Annual Research Report

URL: 

Published: 2016-09-02   Modified: 2019-03-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi