• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Direct modeling of speech waveform using a DNN for text-to-speech synthesis

Research Project

Project/Area Number 16K16096
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Perceptual information processing
Research InstitutionNational Institute of Informatics

Principal Investigator

Takaki Shinji  国立情報学研究所, コンテンツ科学研究系, 特任助教 (50735090)

Project Period (FY) 2016-04-01 – 2019-03-31
Project Status Completed (Fiscal Year 2018)
Budget Amount *help
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2018: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2017: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2016: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords音声合成 / DNN / スペクトル / ディープニューラルネットワーク / 信号処理
Outline of Final Research Achievements

The purpose of this work is to realize text-to-speech synthesis based on direct modeling of speech waveform using a deep neural network. In this work, we exclude heuristic processing included in conventional text-to-speech synthesis. Modeling of amplitude spectra obtained by utilizing simple windowing and Fourier transform, modeling of spectra including phase information and direct modeling of speech waveform were investigated. We realized a direct modeling method of speech waveform for text-to-speech synthesis.

Academic Significance and Societal Importance of the Research Achievements

音声インターフェースの核となる技術であるテキスト音声合成の性能改善のため、Deep Neural Networkを用いた音声波形モデリングが盛んに研究されている。本課題では、非常に注目されているこの研究トピックについて取り組み、テキスト音声合成の性能改善を行った。テキスト音声合成を用いる既存のシステムの性能改善,性能改善に伴う応用アプリの普及等多くの波及効果を期待できる。

Report

(4 results)
  • 2018 Annual Research Report   Final Research Report ( PDF )
  • 2017 Research-status Report
  • 2016 Research-status Report
  • Research Products

    (15 results)

All 2019 2018 2017 2016

All Journal Article (3 results) (of which Int'l Joint Research: 1 results,  Peer Reviewed: 3 results,  Open Access: 2 results,  Acknowledgement Compliant: 1 results) Presentation (12 results) (of which Int'l Joint Research: 5 results,  Invited: 2 results)

  • [Journal Article] Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra2019

    • Author(s)
      Nakashika Toru、Takaki Shinji、Yamagishi Junichi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 27 Issue: 2 Pages: 244-254

    • DOI

      10.1109/taslp.2018.2877465

    • Related Report
      2018 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Investigating very deep highway networks for parametric speech synthesis2018

    • Author(s)
      Wang Xin、Takaki Shinji、Yamagishi Junichi
    • Journal Title

      Speech Communication

      Volume: 96 Pages: 1-9

    • DOI

      10.1016/j.specom.2017.11.002

    • Related Report
      2017 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis2016

    • Author(s)
      Xin Wang, Shinji Takaki, Junichi Yamagishi
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E99.D Issue: 10 Pages: 2471-2480

    • DOI

      10.1587/transinf.2016SLP0011

    • NAID

      130005598240

    • ISSN
      0916-8532, 1745-1361
    • Related Report
      2016 Research-status Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Presentation] CWTスペクトル誤差に基づくDNN音声波形モデルの学習2019

    • Author(s)
      高木 信二, 亀岡 弘和, 山岸 順一
    • Organizer
      音声研究会
    • Related Report
      2018 Annual Research Report
  • [Presentation] スペクトル系列誤差に基づくDNN音声波形モデルの学習2018

    • Author(s)
      高木 信二, 中鹿 亘, 山岸 順一
    • Organizer
      日本音響学会秋季研究発表会
    • Related Report
      2018 Annual Research Report
  • [Presentation] ディープラーニングによるテキスト音声合成の進展2018

    • Author(s)
      高木信二
    • Organizer
      日本音響学会春季研究発表会
    • Related Report
      2017 Research-status Report
    • Invited
  • [Presentation] An Autoregressive Recurrent Mixture Density Network for Parametric Speech Synthesis2017

    • Author(s)
      Xin Wang, Shinji Takaki, Junichi Yamagishi
    • Organizer
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • Place of Presentation
      Hilton Conference Centre, New Orleans, USA
    • Year and Date
      2017-03-07
    • Related Report
      2016 Research-status Report
    • Int'l Joint Research
  • [Presentation] とてもDeepなテキスト音声合成2017

    • Author(s)
      高木 信二
    • Organizer
      音声研究会
    • Place of Presentation
      東京大学
    • Year and Date
      2017-01-21
    • Related Report
      2016 Research-status Report
    • Invited
  • [Presentation] Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis2017

    • Author(s)
      Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
    • Organizer
      INTERSPEECH
    • Related Report
      2017 Research-status Report
    • Int'l Joint Research
  • [Presentation] Complex-valued restricted Boltzmann machine for direct learning of frequency spectra2017

    • Author(s)
      Toru Nakashika, Shinji Takaki, Junichi Yamagishi
    • Organizer
      INTERSPEECH
    • Related Report
      2017 Research-status Report
    • Int'l Joint Research
  • [Presentation] Generative Adversarial Network-based Postfilter for STFT Spectrograms2017

    • Author(s)
      Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
    • Organizer
      INTERSPEECH
    • Related Report
      2017 Research-status Report
    • Int'l Joint Research
  • [Presentation] DNNに基づくテキスト音声合成のためのFFTスペクトルを用いた位相復元に基づく音声波形生成2016

    • Author(s)
      高木 信二,SangJin Kim,亀岡 弘和,山岸 順一
    • Organizer
      第18回音声言語シンポジウム
    • Place of Presentation
      NTT武蔵野研究開発センタ
    • Year and Date
      2016-12-20
    • Related Report
      2016 Research-status Report
  • [Presentation] DNNに基づくテキスト音声合成における話者・ジェンダー・年齢コード利用の検討2016

    • Author(s)
      Hieu Thi Luong, 高木信二, SangJin Kim, 山岸順一
    • Organizer
      音声研究会
    • Place of Presentation
      静岡大学
    • Year and Date
      2016-10-27
    • Related Report
      2016 Research-status Report
  • [Presentation] Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis2016

    • Author(s)
      Shinji Takaki, SangJin Kim, Junichi Yamagishi
    • Organizer
      9th Speech Synthesis Workshop (SSW9)
    • Place of Presentation
      Plug and Play Tech Center
    • Year and Date
      2016-09-14
    • Related Report
      2016 Research-status Report
    • Int'l Joint Research
  • [Presentation] 巨大特定話者データを用いたHMM・DNN・RNNに基づく音声合成システムの性能評価2016

    • Author(s)
      Wang Xin,高木 信二,山岸 順一
    • Organizer
      第112回音声言語情報処理研究
    • Place of Presentation
      山形県天童市鎌田本町・天童温泉・ほほえみの宿「滝の湯」
    • Year and Date
      2016-07-28
    • Related Report
      2016 Research-status Report

URL: 

Published: 2016-04-21   Modified: 2020-03-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi