2022 Fiscal Year Annual Research Report

Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion

Research Project

Project/Area Number	21J20920
Allocation Type	Single-year Grants
Research Institution	Nagoya University
Principal Investigator	HUANG WENCHIN 名古屋大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2021-04-28 – 2024-03-31
Keywords	voice conversion
Outline of Annual Research Achievements	The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the second year, the applicant focused on three aspects. (1) Continued improvement on fundamental VC techniques, specifically self-supervised speech representation (S3R)-based VC, an emerging trend which reduces training data requirements. The applicant kept on updating S3PRL-VC, an open-source toolkit for researchers to evaluate S3R models for VC, and published the latest experimental results in the IEEE Journal of Selected Topics in Signal Processing. (2) Foreign accent conversion, a task that helps reduce foreign accents for efficient communication. A paper that provides an unified evaluation of current approaches and identifies unsolved problems is submitted to an international conference and currently under review. (3) Singing voice conversion, a fundamental technique that has the potential to augment the communication ability of human. The applicant is running a scientific event named the Singing Voice Conversion Challenge 2023, which aims to provide an unified experimental setting including task and dataset, in order to attract researchers world-wide to look into this problem and explore the limitation of the state-of-the-art techniques.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason Due to COVID-19, the applicant paused the project from April 2022 to December 2022, so not much progress was made during the period. Nonetheless, the applicant finally successfully returned to Japan, and started to work on the project as soon as possible. No significant research results have been made yet, but there are a lot of projects being kicked off. Thus the applicant considers the progress to be on track.
Strategy for Future Research Activity	The applicant tries to focus on the following in the third year: (1) Keep running the Singing Voice Conversion Challenge 2023. The challenge is expected to end within 2023, and results will be published in international conferences. (2) Improving communication augmentation application of VC, including accent conversion and singing voice conversion. (3) Real-time low-latency VC. The applicant has started initial investigation in this direction, and the final goal is to build a demo system.

Research Products
(5 results)

All 2022

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (4 results) (of which Int'l Joint Research: 4 results)

[Journal Article] A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion2022
- Author(s)
  Huang Wen-Chin、Yang Shu-Wen、Hayashi Tomoki、Toda Tomoki
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing
  
  Volume: 16 Pages: 1308～1318
- DOI
  10.1109/JSTSP.2022.3193761
- Peer Reviewed
[Presentation] The voicemos challenge 20222022
- Author(s)
  Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
- Organizer
  INTERSPEECH 2022
- Int'l Joint Research
[Presentation] S3prl-vc: Open-source voice conversion framework with self-supervised speech representations2022
- Author(s)
  Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda
- Organizer
  2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Towards identity preserving normal to dysarthric voice conversion2022
- Author(s)
  Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda
- Organizer
  2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research
[Presentation] Ldnet: Unified listener dependent modeling in mos prediction for synthetic speech2022
- Author(s)
  Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda
- Organizer
  2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Int'l Joint Research

2022 Fiscal Year Annual Research Report

Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion

Principal Investigator

HUANG WENCHIN 名古屋大学, 情報学研究科, 特別研究員(DC1)

Current Status of Research Progress

Reason

Research Products

[Journal Article] A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion2022

Author(s)

Journal Title

DOI

[Presentation] The voicemos challenge 20222022

Author(s)

Organizer

[Presentation] S3prl-vc: Open-source voice conversion framework with self-supervised speech representations2022

Author(s)

Organizer

[Presentation] Towards identity preserving normal to dysarthric voice conversion2022

Author(s)

Organizer

[Presentation] Ldnet: Unified listener dependent modeling in mos prediction for synthetic speech2022

Author(s)

Organizer