研究実績の概要 |
The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the third year, we focused on both improving fundamental VC techniques and real-time processing techniques, with particular focuses on three aspects. (1)We organized the singing voice conversion challenge 2023, a challenge that focused on improving and promoting the task of singing voice conversion, a special application of VC. We co-organized the challenge with Tencent AI Lab, China and CMU, USA, and held a special session at ASRU 2023, a flagship conference in speech processing. (2)We launched the VoiceMOS Challenge 2023, the second edition of a scientific event that encouraged research in the area of automatic prediction of Mean Opinion Scores (MOS) for synthesized speech. This year the focus was on a real-world, zero-shot setting, and the challenge attracted 10 teams from academia and industry. Again, we co-organized the challenge with NII, Japan and Academia Sinica, Taiwan, and held a special session also at ASRU 2023, a flagship conference in speech processing. (3)We proposed a sequence-to-sequence VC model that can be executed in real-time with a non-autoregressive architecture. Compared to previous works, the training pipeline is simplified, and its performance is robust against reduced training data, which is an important property for VC. The results were presented at ASJ2024, and we plan to submit a journal paper.
|