2020 Fiscal Year Final Research Report
One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling
Project/Area Number |
19K24371
|
Research Category |
Grant-in-Aid for Research Activity Start-up
|
Allocation Type | Multi-year Fund |
Review Section |
1002:Human informatics, applied informatics and related fields
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Wang Xin 国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)
|
Project Period (FY) |
2019-08-30 – 2021-03-31
|
Keywords | Speech synthesis / Waveform modeling / Deep learning / Neural network |
Outline of Final Research Achievements |
How to generate natural-sounding speech waveform from a digital system is a fundamental question in speech science. By combining classical speech science, signal processing methods, and recent deep-learning techniques, this research project proposes a family of neural waveform model called neural source-filter waveform (NSF) models. It was demonstrated that the proposed NSF models can produce high-quality waveforms at a much faster speed than the commonly used WaveNet models. It was also demonstrated that the NSF models can be extended to incorporate other classical methods from the speech modeling field, including harmonic-plus-noise speech model. Finally, it was demonstrated that the NSF model can be applied to music instrumental audios, showing its flexibility and potential in modeling speech and non-speech sounds.
|
Free Research Field |
知覚情報処理
|
Academic Significance and Societal Importance of the Research Achievements |
Deep learningにより音声波形モデリング技術は近年盛んに研究されている。深層学習手法だけを使用して多くのモデルが提案されている一方で、本研究は深層学習と古典的な信号処理技術の組み合わせることにとりニューラルソースフィルター波形モデル(NSF)と呼ばれるモデルを提案した。 提案されたモデルは、深層学習と信号処理の方法を組み合わせるの方法を示しています。 そして、提案されたモデルは実際のアプリケーションで使用されています。
|