2023 Fiscal Year Annual Research Report
多元信号を用いたリアルタイム低遅延音声変換による音声コミュニケーション拡張
Project/Area Number |
22KJ1519
|
Allocation Type | Multi-year Fund |
Research Institution | Nagoya University |
Principal Investigator |
HUANG WENCHIN 名古屋大学, 情報学研究科, 特別研究員(DC1)
|
Project Period (FY) |
2023-03-08 – 2024-03-31
|
Keywords | voice conversion |
Outline of Annual Research Achievements |
The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the third year, we focused on both improving fundamental VC techniques and real-time processing techniques, with particular focuses on three aspects. (1)We organized the singing voice conversion challenge 2023, a challenge that focused on improving and promoting the task of singing voice conversion, a special application of VC. We co-organized the challenge with Tencent AI Lab, China and CMU, USA, and held a special session at ASRU 2023, a flagship conference in speech processing. (2)We launched the VoiceMOS Challenge 2023, the second edition of a scientific event that encouraged research in the area of automatic prediction of Mean Opinion Scores (MOS) for synthesized speech. This year the focus was on a real-world, zero-shot setting, and the challenge attracted 10 teams from academia and industry. Again, we co-organized the challenge with NII, Japan and Academia Sinica, Taiwan, and held a special session also at ASRU 2023, a flagship conference in speech processing. (3)We proposed a sequence-to-sequence VC model that can be executed in real-time with a non-autoregressive architecture. Compared to previous works, the training pipeline is simplified, and its performance is robust against reduced training data, which is an important property for VC. The results were presented at ASJ2024, and we plan to submit a journal paper.
|