2022 Fiscal Year Annual Research Report
Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion
Project/Area Number |
21J20920
|
Allocation Type | Single-year Grants |
Research Institution | Nagoya University |
Principal Investigator |
HUANG WENCHIN 名古屋大学, 情報学研究科, 特別研究員(DC1)
|
Project Period (FY) |
2021-04-28 – 2024-03-31
|
Keywords | voice conversion |
Outline of Annual Research Achievements |
The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the second year, the applicant focused on three aspects. (1) Continued improvement on fundamental VC techniques, specifically self-supervised speech representation (S3R)-based VC, an emerging trend which reduces training data requirements. The applicant kept on updating S3PRL-VC, an open-source toolkit for researchers to evaluate S3R models for VC, and published the latest experimental results in the IEEE Journal of Selected Topics in Signal Processing. (2) Foreign accent conversion, a task that helps reduce foreign accents for efficient communication. A paper that provides an unified evaluation of current approaches and identifies unsolved problems is submitted to an international conference and currently under review. (3) Singing voice conversion, a fundamental technique that has the potential to augment the communication ability of human. The applicant is running a scientific event named the Singing Voice Conversion Challenge 2023, which aims to provide an unified experimental setting including task and dataset, in order to attract researchers world-wide to look into this problem and explore the limitation of the state-of-the-art techniques.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Due to COVID-19, the applicant paused the project from April 2022 to December 2022, so not much progress was made during the period. Nonetheless, the applicant finally successfully returned to Japan, and started to work on the project as soon as possible. No significant research results have been made yet, but there are a lot of projects being kicked off. Thus the applicant considers the progress to be on track.
|
Strategy for Future Research Activity |
The applicant tries to focus on the following in the third year: (1) Keep running the Singing Voice Conversion Challenge 2023. The challenge is expected to end within 2023, and results will be published in international conferences. (2) Improving communication augmentation application of VC, including accent conversion and singing voice conversion. (3) Real-time low-latency VC. The applicant has started initial investigation in this direction, and the final goal is to build a demo system.
|