Project/Area Number |
17K12720
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Perceptual information processing
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Henter Gustav 国立情報学研究所, コンテンツ科学研究系, 特任研究員 (30793096)
|
Project Period (FY) |
2017-04-01 – 2018-03-31
|
Project Status |
Discontinued (Fiscal Year 2017)
|
Budget Amount *help |
¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Fiscal Year 2018: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2017: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
|
Keywords | Speech synthesis / Latent variables / Controllable synthesis / Deep learning / Emotional speech / 音声合成 / ディープラーニング / 潜在変数 / 制御 |
Outline of Annual Research Achievements |
With this grant, I have derived and published theoretical connections between common (heuristic) practical methods for unsupervised learning of controllable speech synthesisers, and latent variables in Bayesian probability, including how common extensions of the practical approach can be given a probabilistic interpretation. Related work (published as well as submitted) explored the optimal supervised methods for annotating the same data, and (separately) considered speech synthesis with multilingual phonetic control. A listening test is currently comparing the aforementioned supervised and unsupervised approaches against variational autoencoders (VAE) and a journal manuscript with the results, and new theoretical connections between VAE and common synthesis heuristics, is in preparation.
|