研究開始時の研究の概要 |
Voice conversion (VC) is a task that converts one kind of speech to another, and it can help break the barrier for human communi cation, especially for patients with speech disorders. In this research, we plan to build production-level VC systems. First, re al-time, low-latency VC techniques need to developed, such that on-device systems can be built for users. Second, multi-model si gnals such as human body movements or digital instruments allow flexible control of various aspects of speech including emotion. A unified system can be developed for prototype demo.
|
研究実績の概要 |
The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the third year, we focused on both improving fundamental VC techniques and real-time processing techniques, with particular focuses on three aspects. (1)We organized the singing voice conversion challenge 2023, a challenge that focused on improving and promoting the task of singing voice conversion, a special application of VC. We co-organized the challenge with Tencent AI Lab, China and CMU, USA, and held a special session at ASRU 2023, a flagship conference in speech processing. (2)We launched the VoiceMOS Challenge 2023, the second edition of a scientific event that encouraged research in the area of automatic prediction of Mean Opinion Scores (MOS) for synthesized speech. This year the focus was on a real-world, zero-shot setting, and the challenge attracted 10 teams from academia and industry. Again, we co-organized the challenge with NII, Japan and Academia Sinica, Taiwan, and held a special session also at ASRU 2023, a flagship conference in speech processing. (3)We proposed a sequence-to-sequence VC model that can be executed in real-time with a non-autoregressive architecture. Compared to previous works, the training pipeline is simplified, and its performance is robust against reduced training data, which is an important property for VC. The results were presented at ASJ2024, and we plan to submit a journal paper.
|