Project/Area Number |
21K11951
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61010:Perceptual information processing-related
|
Research Institution | National Institute of Informatics |
Principal Investigator |
COOPER Erica 国立情報学研究所, コンテンツ科学研究系, 特任准教授 (30843156)
|
Co-Investigator(Kenkyū-buntansha) |
Kruengkrai Canasai 国立情報学研究所, コンテンツ科学研究系, 特任助教 (10895907)
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Project Status |
Completed (Fiscal Year 2023)
|
Budget Amount *help |
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2023: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2021: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | Text-to-speech synthesis / Low-resource languages / Neural network pruning / Evaluation / text-to-speech synthesis / low-resource languages / speech evaluation / speech synthesis / self-supervised learning / speech assessment / mean opinion score / text-to-speech / vocoder / pruning / efficiency / multi-lingual / machine translation / deep learning / neural networks |
Outline of Research at the Start |
Language technology has improved due to advances in neural-network-based approaches; for example, speech synthesis has reached the quality of human speech. However, neural models require large quantities of data. Speech technologies bring social benefits of accessibility and communication - to ensure broad access to these benefits, we consider language-independent methods that can make use of less data. We propose 1) articulatory class based end-to-end speech synthesis; 2) multi-modal machine translation with text and speech; and 3) neural architecture search for data-efficient architectures.
|
Outline of Final Research Achievements |
We explored pruning for lightweight text-to-speech synthesis (TTS), developed data-efficient TTS for low-resource languages, and advanced the field of automatic quality prediction for TTS. We found that up to 90% of TTS model weights can be pruned without reducing output quality. We developed a data processing pipeline for building TTS corpora for low-resource languages using podcast data, resulting in a large-scale, high-quality, publicly-available dataset. We also developed a TTS system using this data that can be repurposed for any language with similar data. As self-supervised speech representations have been effective for many downstream tasks, we next investigated these as an intermediate representation for TTS trained on multilingual data, which can be fine-tuned to a new language. Finally, we identified automatic evaluation of TTS as a critical topic. We launched a series of challenges for this task in 2022 and 2023 which attracted many participants and advanced the field.
|
Academic Significance and Societal Importance of the Research Achievements |
We developed TTS trainable on small amounts of data and lightweight TTS models. We also advanced the field of TTS evaluation. This benefits researchers and society by reducing barriers of entry to creating TTS for low-resource languages, expanding accessibility benefits of TTS to a broader audience.
|