2018 Fiscal Year Final Research Report

Direct modeling of speech waveform using a DNN for text-to-speech synthesis

Research Project

PDF

Project/Area Number	16K16096
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	National Institute of Informatics
Principal Investigator	Takaki Shinji 国立情報学研究所, コンテンツ科学研究系, 特任助教 (50735090)
Project Period (FY)	2016-04-01 – 2019-03-31
Keywords	音声合成 / DNN
Outline of Final Research Achievements	The purpose of this work is to realize text-to-speech synthesis based on direct modeling of speech waveform using a deep neural network. In this work, we exclude heuristic processing included in conventional text-to-speech synthesis. Modeling of amplitude spectra obtained by utilizing simple windowing and Fourier transform, modeling of spectra including phase information and direct modeling of speech waveform were investigated. We realized a direct modeling method of speech waveform for text-to-speech synthesis.
Free Research Field	音声情報処理
Academic Significance and Societal Importance of the Research Achievements	音声インターフェースの核となる技術であるテキスト音声合成の性能改善のため、Deep Neural Networkを用いた音声波形モデリングが盛んに研究されている。本課題では、非常に注目されているこの研究トピックについて取り組み、テキスト音声合成の性能改善を行った。テキスト音声合成を用いる既存のシステムの性能改善，性能改善に伴う応用アプリの普及等多くの波及効果を期待できる。