2020 年度実績報告書

One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling

研究課題

研究課題/領域番号	19K24371
研究機関	国立情報学研究所
研究代表者	Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
研究期間 (年度)	2019-08-30 – 2021-03-31
キーワード	Speech synthesis / Waveform modeling / Deep learning / Neural network
研究実績の概要	How to generate natural-sounding speech waveform from a digital system is a fundamental question in speech science. The purpose of this project is to combine classical speech science and recent deep-learning techniques and design a neural waveform model that generates high-quality waveforms at a fast speed. Specifically, this project has three goals: 1. fast waveform generation; 2. improved quality of generated waveforms; 3. generation of not only speech but also non-speech waveforms. In the 1st year, we have proposed a family of neural source-filter waveform models that combines the classical source-filter speech production model and the recent dilated convolution neural networks, and we have achieved the three goals. During the 2nd year, we extended the proposed models for the 2nd and 3rd goals. For the 2nd goal, we enhanced the proposed models with a trainable cyclic-noise-based source module and demonstrated its better performance in modeling multiple speakers' speech data using a single model. This is published in Interspeech 2020. We also designed optional trainable digital FIR filters for the proposed models so that it can better model the speech data with reverberation, and this is published in Interspeech 2020 and IEEE SLT 2021. For the 3rd goal, we applied the proposed models to polyphonic piano sound modeling and demonstrated that the model not only works with monophonic but also polyphonic sounds. We are preparing a paper for this work. Finally, we re-implemented and open-sourced the proposed models using a popular deep learning language called Pytorch.
備考	Web (1) is the home page of proposed neural source-filter waveform models; Web (2) is the Pytorch source code of proposed models; Web (3-4) are CUDA-based source code of the proposed models.

研究成果
(10件)

すべて 2021 2020 その他

すべて国際共同研究 (1件) 学会発表 (5件) (うち国際学会 5件、招待講演 2件) 備考 (4件)

[国際共同研究] USTC(中国)
- 国名
  中国
- 外国機関名
  USTC
[学会発表] Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation2021
- 著者名/発表者名
  Ai Yang, Li Haoyu, Wang Xin, Yamagishi Junichi, Ling Zhenhua
- 学会等名
  2021 IEEE Spoken Language Technology Workshop (SLT)
- 国際学会
[学会発表] Neural auto-regressive, source-filter and glottal vocoders for speech and music signals2020
- 著者名/発表者名
  Yamagishi Junichi, Wang Xin
- 学会等名
  ISCA 2020 Speech Processing Courses in Crete
- 国際学会 / 招待講演
[学会発表] Tutorial on Neural statistical parametric speech synthesis2020
- 著者名/発表者名
  Wang Xin
- 学会等名
  The Speaker and Language Recognition Workshop, Odysessy 2020
- 国際学会 / 招待講演
[学会発表] Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model2020
- 著者名/発表者名
  Wang Xin, Yamagishi Junichi
- 学会等名
  Proc. Interspeech
- 国際学会
[学会発表] Reverberation Modeling for Source-Filter-Based Neural Vocoder2020
- 著者名/発表者名
  Ai Yang, Wang Xin, Yamagishi Junichi, Ling Zhenhua
- 学会等名
  Proc. Interspeech
- 国際学会
[備考] Home page of neural source-filter waveform models
- URL
  https://nii-yamagishilab.github.io/samples-nsf/
[備考] Neural source-filter waveform model in Pytorch
- URL
  https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts
[備考] Neural source-filter waveform model in CUDA
- URL
  https://github.com/nii-yamagishilab/project-CURRENNT-public
[備考] Scripts to use the CUDA implementation
- URL
  https://github.com/nii-yamagishilab/project-CURRENNT-scripts

2020 年度 実績報告書

One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling

研究代表者

Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)

研究成果

[国際共同研究] USTC(中国)

国名

外国機関名

[学会発表] Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation2021

著者名/発表者名

学会等名

[学会発表] Neural auto-regressive, source-filter and glottal vocoders for speech and music signals2020

著者名/発表者名

学会等名

[学会発表] Tutorial on Neural statistical parametric speech synthesis2020

著者名/発表者名

学会等名

[学会発表] Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model2020

著者名/発表者名

学会等名

[学会発表] Reverberation Modeling for Source-Filter-Based Neural Vocoder2020

著者名/発表者名

学会等名

[備考] Home page of neural source-filter waveform models

URL

[備考] Neural source-filter waveform model in Pytorch

URL

[備考] Neural source-filter waveform model in CUDA

URL

[備考] Scripts to use the CUDA implementation

URL

2020 年度実績報告書