研究実績の概要 |
We developed methods for text-to-speech (TTS) synthesis for low-resource languages using smaller amounts of data as well as data from less traditional sources. First, we developed an approach to building text-to-speech (TTS) corpora from podcast data, using the Hebrew language as a case study, resulting in a publicly-available dataset. We next developed a data processing pipeline and TTS system that can be repurposed for other low-resource languages that have similar available data, resulting in one peer-reviewed publication at Interspeech 2023. Finally, we continued investigating self-supervised speech representations as an intermediate representation for multilingual TTS which can be fine-tuned to a new language.
Having previously identified automatic evaluation of TTS as a critical issue especially for low-resource languages, we continued the VoiceMOS Challenge, a shared task for automatic TTS evaluation, by running a second edition focusing on zero-shot multi-domain scenarios. The challenge was presented as a special session at ASRU 2023, and attracted ten teams from academia and industry. We also studied contextual effects on listener ratings, self-supervised speech models' abilities for speech quality prediction, and a ranking-based quality prediction approach, resulting in three additional peer-reviewed publications.
|