2020 Fiscal Year Annual Research Report
One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling
Project/Area Number |
19K24371
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Keywords | Speech synthesis / Waveform modeling / Deep learning / Neural network |
Outline of Annual Research Achievements |
How to generate natural-sounding speech waveform from a digital system is a fundamental question in speech science. The purpose of this project is to combine classical speech science and recent deep-learning techniques and design a neural waveform model that generates high-quality waveforms at a fast speed. Specifically, this project has three goals: 1. fast waveform generation; 2. improved quality of generated waveforms; 3. generation of not only speech but also non-speech waveforms. In the 1st year, we have proposed a family of neural source-filter waveform models that combines the classical source-filter speech production model and the recent dilated convolution neural networks, and we have achieved the three goals. During the 2nd year, we extended the proposed models for the 2nd and 3rd goals. For the 2nd goal, we enhanced the proposed models with a trainable cyclic-noise-based source module and demonstrated its better performance in modeling multiple speakers' speech data using a single model. This is published in Interspeech 2020. We also designed optional trainable digital FIR filters for the proposed models so that it can better model the speech data with reverberation, and this is published in Interspeech 2020 and IEEE SLT 2021. For the 3rd goal, we applied the proposed models to polyphonic piano sound modeling and demonstrated that the model not only works with monophonic but also polyphonic sounds. We are preparing a paper for this work. Finally, we re-implemented and open-sourced the proposed models using a popular deep learning language called Pytorch.
|
Remarks |
Web (1) is the home page of proposed neural source-filter waveform models; Web (2) is the Pytorch source code of proposed models; Web (3-4) are CUDA-based source code of the proposed models.
|
Research Products
(10 results)