2020 Fiscal Year Annual Research Report

One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling

Research Project

Project/Area Number	19K24371
Research Institution	National Institute of Informatics
Principal Investigator	Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
Project Period (FY)	2019-08-30 – 2021-03-31
Keywords	Speech synthesis / Waveform modeling / Deep learning / Neural network
Outline of Annual Research Achievements	How to generate natural-sounding speech waveform from a digital system is a fundamental question in speech science. The purpose of this project is to combine classical speech science and recent deep-learning techniques and design a neural waveform model that generates high-quality waveforms at a fast speed. Specifically, this project has three goals: 1. fast waveform generation; 2. improved quality of generated waveforms; 3. generation of not only speech but also non-speech waveforms. In the 1st year, we have proposed a family of neural source-filter waveform models that combines the classical source-filter speech production model and the recent dilated convolution neural networks, and we have achieved the three goals. During the 2nd year, we extended the proposed models for the 2nd and 3rd goals. For the 2nd goal, we enhanced the proposed models with a trainable cyclic-noise-based source module and demonstrated its better performance in modeling multiple speakers' speech data using a single model. This is published in Interspeech 2020. We also designed optional trainable digital FIR filters for the proposed models so that it can better model the speech data with reverberation, and this is published in Interspeech 2020 and IEEE SLT 2021. For the 3rd goal, we applied the proposed models to polyphonic piano sound modeling and demonstrated that the model not only works with monophonic but also polyphonic sounds. We are preparing a paper for this work. Finally, we re-implemented and open-sourced the proposed models using a popular deep learning language called Pytorch.
Remarks	Web (1) is the home page of proposed neural source-filter waveform models; Web (2) is the Pytorch source code of proposed models; Web (3-4) are CUDA-based source code of the proposed models.

Research Products
(10 results)

All 2021 2020 Other

All Int'l Joint Research (1 results) Presentation (5 results) (of which Int'l Joint Research: 5 results, Invited: 2 results) Remarks (4 results)

[Int'l Joint Research] USTC(中国)
- Country Name
  CHINA
- Counterpart Institution
  USTC
[Presentation] Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation2021
- Author(s)
  Ai Yang, Li Haoyu, Wang Xin, Yamagishi Junichi, Ling Zhenhua
- Organizer
  2021 IEEE Spoken Language Technology Workshop (SLT)
- Int'l Joint Research
[Presentation] Neural auto-regressive, source-filter and glottal vocoders for speech and music signals2020
- Author(s)
  Yamagishi Junichi, Wang Xin
- Organizer
  ISCA 2020 Speech Processing Courses in Crete
- Int'l Joint Research / Invited
[Presentation] Tutorial on Neural statistical parametric speech synthesis2020
- Author(s)
  Wang Xin
- Organizer
  The Speaker and Language Recognition Workshop, Odysessy 2020
- Int'l Joint Research / Invited
[Presentation] Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model2020
- Author(s)
  Wang Xin, Yamagishi Junichi
- Organizer
  Proc. Interspeech
- Int'l Joint Research
[Presentation] Reverberation Modeling for Source-Filter-Based Neural Vocoder2020
- Author(s)
  Ai Yang, Wang Xin, Yamagishi Junichi, Ling Zhenhua
- Organizer
  Proc. Interspeech
- Int'l Joint Research
[Remarks] Home page of neural source-filter waveform models
- URL
  https://nii-yamagishilab.github.io/samples-nsf/
[Remarks] Neural source-filter waveform model in Pytorch
- URL
  https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts
[Remarks] Neural source-filter waveform model in CUDA
- URL
  https://github.com/nii-yamagishilab/project-CURRENNT-public
[Remarks] Scripts to use the CUDA implementation
- URL
  https://github.com/nii-yamagishilab/project-CURRENNT-scripts

2020 Fiscal Year Annual Research Report

One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling

Principal Investigator

Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)

Research Products

[Int'l Joint Research] USTC(中国)

Country Name

Counterpart Institution

[Presentation] Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation2021

Author(s)

Organizer

[Presentation] Neural auto-regressive, source-filter and glottal vocoders for speech and music signals2020

Author(s)

Organizer

[Presentation] Tutorial on Neural statistical parametric speech synthesis2020

Author(s)

Organizer

[Presentation] Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model2020

Author(s)

Organizer

[Presentation] Reverberation Modeling for Source-Filter-Based Neural Vocoder2020

Author(s)

Organizer

[Remarks] Home page of neural source-filter waveform models

URL

[Remarks] Neural source-filter waveform model in Pytorch

URL

[Remarks] Neural source-filter waveform model in CUDA

URL

[Remarks] Scripts to use the CUDA implementation

URL