• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2022 Fiscal Year Research-status Report

Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation

Research Project

Project/Area Number 21K11951
Research InstitutionNational Institute of Informatics

Principal Investigator

Cooper Erica  国立情報学研究所, コンテンツ科学研究系, 特任助教 (30843156)

Co-Investigator(Kenkyū-buntansha) Kruengkrai Canasai  国立情報学研究所, コンテンツ科学研究系, 特任助教 (10895907)
Project Period (FY) 2021-04-01 – 2024-03-31
Keywordsspeech synthesis / self-supervised learning / low-resource languages / speech assessment / mean opinion score
Outline of Annual Research Achievements

In this second year of the project, we looked at two main topics: language-independent, data-efficient text-to-speech synthesis for low-resource languages using self-supervised speech representations, and automatic mean opinion score prediction.

Self-supervised representations for speech have shown remarkable usefulness for many downstream speech-related tasks, and have been shown to contain phonetic information. We therefore chose these as an intermediate representation for text-to-speech synthesis trained on data from many languages, which can then be fine-tuned to a new language using only a small amount of data. This is ongoing work in progress, and we are collaborating with researchers from the National Research Council of Canada and the University of Edinburgh.

We have also identified automatic evaluation of synthesized speech as an important topic for low-resource languages, since finding listeners to participate in listening tests can be especially difficult for these languages. In collaboration with Nagoya University and Academia Sinica, we co-organized the first VoiceMOS Challenge, a shared task for automatic mean opinion score (MOS) prediction for synthesized speech. The challenge attracted 22 participating teams from academia and industry, and we ran a special session about the challenge at Interspeech 2022. This challenge has advanced the field by generating a great deal of interest in this topic.

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

In the first year of the project, we initially proposed to work on language-independent speech synthesis, but instead we worked on efficient speech synthesis architectures using neural network pruning (originally scheduled for the third year). Therefore, we worked this year on language-independent approaches for speech synthesis. While we had originally planned to investigate articulatory features for this purpose, we shifted our focus to the use of self-supervised speech representations instead, since these seem very promising and well-suited for our task.

Although it was outside of our original proposal, automatic speech quality assessment has arisen during this project as an important and relevant topic. The ability to automatically predict the quality of synthesized speech, especially for low-resource languages, will facilitate future research in low-resource speech synthesis.

Strategy for Future Research Activity

Although we changed the order of the topics in original plan somewhat, the topic of multimodal text and speech modeling for low-resource languages still remains -- we will therefore focus on this in the third year. We will also continue our ongoing research in language-independent speech synthesis that is adaptable to low-resource languages, and we will also run the 2023 edition of the VoiceMOS Challenge, which focuses on zero-shot prediction of out-of-domain synthesized speech.

Causes of Carryover

Travel expenses were not used due to the ongoing coronavirus situation in 2022.

The budget remaining will be used for attending international conferences in 2023.

Remarks

Official website for the VoiceMOS Challenge 2022, and open-source code for SSL-based MOS predictor which was one of the baseline systems for the challenge.

  • Research Products

    (11 results)

All 2022 Other

All Int'l Joint Research (3 results) Presentation (6 results) (of which Int'l Joint Research: 3 results,  Invited: 3 results) Remarks (2 results)

  • [Int'l Joint Research] Academia Sinica(その他の国・地域 Taiwan)

    • Country Name
      その他の国・地域
    • Counterpart Institution
      Academia Sinica
  • [Int'l Joint Research] National Research Council(カナダ)

    • Country Name
      CANADA
    • Counterpart Institution
      National Research Council
  • [Int'l Joint Research] University of Edinburgh(英国)

    • Country Name
      UNITED KINGDOM
    • Counterpart Institution
      University of Edinburgh
  • [Presentation] Generalization Ability of MOS Prediction Networks2022

    • Author(s)
      Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi
    • Organizer
      ICASSP 2022
    • Int'l Joint Research
  • [Presentation] LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech2022

    • Author(s)
      Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda
    • Organizer
      ICASSP 2022
    • Int'l Joint Research
  • [Presentation] The VoiceMOS Challenge 20222022

    • Author(s)
      Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
    • Organizer
      Interspeech 2022
    • Int'l Joint Research
  • [Presentation] The VoiceMOS Challenge: Data-Driven Mean Opinion Score Prediction for Synthesized Speech2022

    • Author(s)
      Erica Cooper
    • Organizer
      2022 Autumn Meeting of the Acoustical Society of Japan
    • Invited
  • [Presentation] Objective Evaluation in TTS2022

    • Author(s)
      Erica Cooper
    • Organizer
      KTH Seminar on Speech Synthesis Evaluation, KTH Royal Institute of Technology, Department of Speech, Music, and Hearing
    • Invited
  • [Presentation] The VoiceMOS Challenge 20222022

    • Author(s)
      Erica Cooper, Wen-Chin Huang
    • Organizer
      Special Interest Group on Spoken Language Processing, Information Processing Society of Japan
    • Invited
  • [Remarks] The VoiceMOS Challenge 2022 website

    • URL

      https://voicemos-challenge-2022.github.io

  • [Remarks] Open-source code for SSL-based MOS predictor

    • URL

      https://github.com/nii-yamagishilab/mos-finetune-ssl

URL: 

Published: 2023-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi