• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

One model for all sounds: fast and high-quality neural source-filter model for speech and non-speech waveform modeling

Research Project

Project/Area Number 19K24371
Research Category

Grant-in-Aid for Research Activity Start-up

Allocation TypeMulti-year Fund
Review Section 1002:Human informatics, applied informatics and related fields
Research InstitutionNational Institute of Informatics

Principal Investigator

Wang Xin  国立情報学研究所, コンテンツ科学研究系, 特任助教 (60843141)

Project Period (FY) 2019-08-30 – 2021-03-31
Project Status Completed (Fiscal Year 2020)
Budget Amount *help
¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2019: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
KeywordsSpeech synthesis / Waveform modeling / Deep learning / Neural network / speech synthesis / waveform modeling / deep learning / neural network
Outline of Research at the Start

Generating natural-sounding waveforms from a computer is a fundamental speech science topic. In this research, we plan to combine speech science and deep learning. We propose to combine a classical speech production model called source-filter model with neural network, which results in a neural source-filter waveform model. Our model is expected to generate waveforms with a faster speed and improved quality; it is also expected to be applicable not only to speech but also to singing voice and non-speech sounds. Such a new model will be useful in many applications such as text-to-speech.

Outline of Final Research Achievements

How to generate natural-sounding speech waveform from a digital system is a fundamental question in speech science. By combining classical speech science, signal processing methods, and recent deep-learning techniques, this research project proposes a family of neural waveform model called neural source-filter waveform (NSF) models. It was demonstrated that the proposed NSF models can produce high-quality waveforms at a much faster speed than the commonly used WaveNet models. It was also demonstrated that the NSF models can be extended to incorporate other classical methods from the speech modeling field, including harmonic-plus-noise speech model. Finally, it was demonstrated that the NSF model can be applied to music instrumental audios, showing its flexibility and potential in modeling speech and non-speech sounds.

Academic Significance and Societal Importance of the Research Achievements

Deep learningにより音声波形モデリング技術は近年盛んに研究されている。深層学習手法だけを使用して多くのモデルが提案されている一方で、本研究は深層学習と古典的な信号処理技術の組み合わせることにとりニューラルソースフィルター波形モデル(NSF)と呼ばれるモデルを提案した。 提案されたモデルは、深層学習と信号処理の方法を組み合わせるの方法を示しています。 そして、提案されたモデルは実際のアプリケーションで使用されています。

Report

(3 results)
  • 2020 Annual Research Report   Final Research Report ( PDF )
  • 2019 Research-status Report
  • Research Products

    (18 results)

All 2021 2020 2019 Other

All Int'l Joint Research (3 results) Journal Article (3 results) (of which Int'l Joint Research: 1 results,  Peer Reviewed: 3 results,  Open Access: 3 results) Presentation (6 results) (of which Int'l Joint Research: 5 results,  Invited: 3 results) Remarks (6 results)

  • [Int'l Joint Research] USTC(中国)

    • Related Report
      2020 Annual Research Report
  • [Int'l Joint Research] University of Edinburgh(英国)

    • Related Report
      2019 Research-status Report
  • [Int'l Joint Research] Aalto University(フィンランド)

    • Related Report
      2019 Research-status Report
  • [Journal Article] Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis2020

    • Author(s)
      Wang Xin、Takaki Shinji、Yamagishi Junichi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 28 Pages: 402-415

    • DOI

      10.1109/taslp.2019.2956145

    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Transferring neural speech waveform synthesizers to musical instrument sounds generation2020

    • Author(s)
      Zhao Yi, Wang Xin, Juvela Lauri, Yamagishi Junichi
    • Journal Title

      IEEE International Conference on Acoustics, Speech and Signal Processing

      Volume: - Pages: 6269-6273

    • DOI

      10.1109/icassp40776.2020.9053047

    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis2019

    • Author(s)
      Wang Xin、Yamagishi Junichi
    • Journal Title

      Proceeding of Speech Synthesis Workshop

      Volume: - Pages: 1-6

    • DOI

      10.21437/ssw.2019-1

    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation2021

    • Author(s)
      Ai Yang, Li Haoyu, Wang Xin, Yamagishi Junichi, Ling Zhenhua
    • Organizer
      2021 IEEE Spoken Language Technology Workshop (SLT)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Neural auto-regressive, source-filter and glottal vocoders for speech and music signals2020

    • Author(s)
      Yamagishi Junichi, Wang Xin
    • Organizer
      ISCA 2020 Speech Processing Courses in Crete
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Tutorial on Neural statistical parametric speech synthesis2020

    • Author(s)
      Wang Xin
    • Organizer
      The Speaker and Language Recognition Workshop, Odysessy 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Using Cyclic Noise as the Source Signal for Neural Source-Filter-Based Speech Waveform Model2020

    • Author(s)
      Wang Xin, Yamagishi Junichi
    • Organizer
      Proc. Interspeech
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Reverberation Modeling for Source-Filter-Based Neural Vocoder2020

    • Author(s)
      Ai Yang, Wang Xin, Yamagishi Junichi, Ling Zhenhua
    • Organizer
      Proc. Interspeech
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Neural-network-based waveform modeling for text-to-speech synthesis2019

    • Author(s)
      Wang Xin
    • Organizer
      Lecture Series on Natural Language Processing
    • Related Report
      2019 Research-status Report
    • Invited
  • [Remarks] Home page of neural source-filter waveform models

    • URL

      https://nii-yamagishilab.github.io/samples-nsf/

    • Related Report
      2020 Annual Research Report 2019 Research-status Report
  • [Remarks] Neural source-filter waveform model in Pytorch

    • URL

      https://github.com/nii-yamagishilab/project-NN-Pytorch-scripts

    • Related Report
      2020 Annual Research Report
  • [Remarks] Neural source-filter waveform model in CUDA

    • URL

      https://github.com/nii-yamagishilab/project-CURRENNT-public

    • Related Report
      2020 Annual Research Report
  • [Remarks] Scripts to use the CUDA implementation

    • URL

      https://github.com/nii-yamagishilab/project-CURRENNT-scripts

    • Related Report
      2020 Annual Research Report
  • [Remarks] Neural source-filter waveform model source code

    • URL

      https://github.com/nii-yamagishilab/project-CURRENNT-public

    • Related Report
      2019 Research-status Report
  • [Remarks] Scripts to train and use the proposed models

    • URL

      https://github.com/nii-yamagishilab/project-CURRENNT-scripts

    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-09-03   Modified: 2022-01-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi