2023 Fiscal Year Annual Research Report
Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation
Project/Area Number |
21K11951
|
Research Institution | National Institute of Informatics |
Principal Investigator |
Cooper Erica 国立情報学研究所, コンテンツ科学研究系, 特任准教授 (30843156)
|
Co-Investigator(Kenkyū-buntansha) |
Kruengkrai Canasai 国立情報学研究所, コンテンツ科学研究系, 特任助教 (10895907) [Withdrawn]
|
Project Period (FY) |
2021-04-01 – 2024-03-31
|
Keywords | text-to-speech synthesis / low-resource languages / speech evaluation |
Outline of Annual Research Achievements |
We developed methods for text-to-speech (TTS) synthesis for low-resource languages using smaller amounts of data as well as data from less traditional sources. First, we developed an approach to building text-to-speech (TTS) corpora from podcast data, using the Hebrew language as a case study, resulting in a publicly-available dataset. We next developed a data processing pipeline and TTS system that can be repurposed for other low-resource languages that have similar available data, resulting in one peer-reviewed publication at Interspeech 2023. Finally, we continued investigating self-supervised speech representations as an intermediate representation for multilingual TTS which can be fine-tuned to a new language.
Having previously identified automatic evaluation of TTS as a critical issue especially for low-resource languages, we continued the VoiceMOS Challenge, a shared task for automatic TTS evaluation, by running a second edition focusing on zero-shot multi-domain scenarios. The challenge was presented as a special session at ASRU 2023, and attracted ten teams from academia and industry. We also studied contextual effects on listener ratings, self-supervised speech models' abilities for speech quality prediction, and a ranking-based quality prediction approach, resulting in three additional peer-reviewed publications.
|
Remarks |
Various publicly-available datasets, open-source code repositories, and webpages related to the work conducted during this year of the project.
|
Research Products
(14 results)