• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2021 Fiscal Year Final Research Report

A Study of Deep Gaussian Process Based Statistcal Speech Synthesis

Research Project

  • PDF
Project/Area Number 19K20292
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionThe University of Tokyo

Principal Investigator

Koriyama Tomoki  東京大学, 大学院情報理工学系研究科, 講師 (50749124)

Project Period (FY) 2019-04-01 – 2022-03-31
Keywordsガウス過程 / 深層学習 / 音声合成 / 潜在変数モデル / 時系列モデル
Outline of Final Research Achievements

We proposed an extension of deep-Gaussian-process (DGP)-based speech synthesis to enable time-series modeling of speech characteristics. Specifically, we proposed DGP with recurrent, self-attention, and sequence-to-sequence architecture. The proposed speech synthesis methods tend to generate more natural-sound speech than that generated by DNN-based ones that have similar architectures of DGP. The results of this research project show that various structures used in DGP can be used in a similar way to DNN, and that robust deep learning using Bayesian features is possible.

Free Research Field

音声情報処理

Academic Significance and Societal Importance of the Research Achievements

現在,多くの機械学習の研究はDNNを基盤要素としているが,DNNの学習におけるハイパーパラメータの調整は手間のかかるものであり,機械学習モデルの構築は職人作業のようになっている現状がある.代替となり得るモデルとしてガウス過程回帰に注目が集まっているが,自由度が低く様々なデータに適用できない問題があった.本研究の応用実験によってガウス過程回帰の深層モデルとしての自由度の向上を明らかにした.この成果によって,音声に限らず自由度の高い深層学習モデルの頑健な学習への道筋の一つを示した.

URL: 

Published: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi