2023 Fiscal Year Annual Research Report

多元信号を用いたリアルタイム低遅延音声変換による音声コミュニケーション拡張

Research Project

Project/Area Number	22KJ1519
Allocation Type	Multi-year Fund
Research Institution	Nagoya University
Principal Investigator	HUANG WENCHIN 名古屋大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2023-03-08 – 2024-03-31
Keywords	voice conversion
Outline of Annual Research Achievements	The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the third year, we focused on both improving fundamental VC techniques and real-time processing techniques, with particular focuses on three aspects. (1)We organized the singing voice conversion challenge 2023, a challenge that focused on improving and promoting the task of singing voice conversion, a special application of VC. We co-organized the challenge with Tencent AI Lab, China and CMU, USA, and held a special session at ASRU 2023, a flagship conference in speech processing. (2)We launched the VoiceMOS Challenge 2023, the second edition of a scientific event that encouraged research in the area of automatic prediction of Mean Opinion Scores (MOS) for synthesized speech. This year the focus was on a real-world, zero-shot setting, and the challenge attracted 10 teams from academia and industry. Again, we co-organized the challenge with NII, Japan and Academia Sinica, Taiwan, and held a special session also at ASRU 2023, a flagship conference in speech processing. (3)We proposed a sequence-to-sequence VC model that can be executed in real-time with a non-autoregressive architecture. Compared to previous works, the training pipeline is simplified, and its performance is robust against reduced training data, which is an important property for VC. The results were presented at ASJ2024, and we plan to submit a journal paper.

Research Products
(4 results)

All 2024 2023

All Presentation (4 results) (of which Int'l Joint Research: 3 results)

[Presentation] AAS-VC：非自己回帰型系列音声変換における時間対応付け学習の頑健性2024
- Author(s)
  HUANG Wen-Chin, 小林和弘, 戸田智基
- Organizer
  音講論
[Presentation] Evaluating methods for ground-truth-free foreign accent conversion2023
- Author(s)
  Wen-Chin Huang, Tomoki Toda
- Organizer
  APSIPA ASC
- Int'l Joint Research
[Presentation] The Singing Voice Conversion Challenge 20232023
- Author(s)
  Wen-Chin Huang, Lester Violeta, Songxiang Liu, Jiatong. Shi, Tomoki Toda
- Organizer
  ASRU
- Int'l Joint Research
[Presentation] The VoiceMOS Challenge 2023: zero-shot subjective speech quality prediction for multiple domains2023
- Author(s)
  Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
- Organizer
  ASRU
- Int'l Joint Research

2023 Fiscal Year Annual Research Report

多元信号を用いたリアルタイム低遅延音声変換による音声コミュニケーション拡張

Principal Investigator

HUANG WENCHIN 名古屋大学, 情報学研究科, 特別研究員(DC1)

Research Products

[Presentation] AAS-VC：非自己回帰型系列音声変換における時間対応付け学習の頑健性2024

Author(s)

Organizer

[Presentation] Evaluating methods for ground-truth-free foreign accent conversion2023

Author(s)

Organizer

[Presentation] The Singing Voice Conversion Challenge 20232023

Author(s)

Organizer

[Presentation] The VoiceMOS Challenge 2023: zero-shot subjective speech quality prediction for multiple domains2023

Author(s)

Organizer