• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Annual Research Report

Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation

Research Project

Project/Area Number 21K11951
Research InstitutionNational Institute of Informatics

Principal Investigator

Cooper Erica  国立情報学研究所, コンテンツ科学研究系, 特任准教授 (30843156)

Co-Investigator(Kenkyū-buntansha) Kruengkrai Canasai  国立情報学研究所, コンテンツ科学研究系, 特任助教 (10895907) [Withdrawn]
Project Period (FY) 2021-04-01 – 2024-03-31
Keywordstext-to-speech synthesis / low-resource languages / speech evaluation
Outline of Annual Research Achievements

We developed methods for text-to-speech (TTS) synthesis for low-resource languages using smaller amounts of data as well as data from less traditional sources. First, we developed an approach to building text-to-speech (TTS) corpora from podcast data, using the Hebrew language as a case study, resulting in a publicly-available dataset. We next developed a data processing pipeline and TTS system that can be repurposed for other low-resource languages that have similar available data, resulting in one peer-reviewed publication at Interspeech 2023. Finally, we continued investigating self-supervised speech representations as an intermediate representation for multilingual TTS which can be fine-tuned to a new language.

Having previously identified automatic evaluation of TTS as a critical issue especially for low-resource languages, we continued the VoiceMOS Challenge, a shared task for automatic TTS evaluation, by running a second edition focusing on zero-shot multi-domain scenarios. The challenge was presented as a special session at ASRU 2023, and attracted ten teams from academia and industry. We also studied contextual effects on listener ratings, self-supervised speech models' abilities for speech quality prediction, and a ranking-based quality prediction approach, resulting in three additional peer-reviewed publications.

Remarks

Various publicly-available datasets, open-source code repositories, and webpages related to the work conducted during this year of the project.

  • Research Products

    (14 results)

All 2024 2023 Other

All Int'l Joint Research (4 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results,  Open Access: 1 results) Presentation (5 results) (of which Int'l Joint Research: 5 results) Remarks (4 results)

  • [Int'l Joint Research] up.ai(イスラエル)

    • Country Name
      ISRAEL
    • Counterpart Institution
      up.ai
  • [Int'l Joint Research] University of Edinburgh(英国)

    • Country Name
      UNITED KINGDOM
    • Counterpart Institution
      University of Edinburgh
  • [Int'l Joint Research] Academia Sinica(その他の国・地域 Taiwan)

    • Country Name
      その他の国・地域
    • Counterpart Institution
      Academia Sinica
  • [Int'l Joint Research] National Research Council(カナダ)

    • Country Name
      CANADA
    • Counterpart Institution
      National Research Council
  • [Journal Article] A review on subjective and objective evaluation of synthetic speech2024

    • Author(s)
      Cooper Erica、Huang Wen-Chin、Tsao Yu、Wang Hsin-Min、Toda Tomoki、Yamagishi Junichi
    • Journal Title

      Acoustical Science and Technology

      Volume: advpub Pages: 1-26

    • DOI

      10.1250/ast.e24.12

    • Open Access / Int'l Joint Research
  • [Presentation] Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction2024

    • Author(s)
      Aditya Ravuri, Erica Cooper, Junichi Yamagishi
    • Organizer
      IEEE ICASSP 2024 workshop on Self-supervision in Audio, Speech and Beyond
    • Int'l Joint Research
  • [Presentation] SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion2023

    • Author(s)
      Orian Sharoni, Roee Shenberg, Erica Cooper
    • Organizer
      Interspeech 2023
    • Int'l Joint Research
  • [Presentation] Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech2023

    • Author(s)
      Erica Cooper, Junichi Yamagishi
    • Organizer
      Interspeech 2023
    • Int'l Joint Research
  • [Presentation] Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting2023

    • Author(s)
      Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah
    • Organizer
      ASRU 2023
    • Int'l Joint Research
  • [Presentation] The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains2023

    • Author(s)
      Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
    • Organizer
      ASRU 2023
    • Int'l Joint Research
  • [Remarks] SASPEECH: Hebrew speech and transcripts for TTS

    • URL

      https://openslr.org/134/

  • [Remarks] Listening test data for "Range-Equalizing Bias"

    • URL

      https://zenodo.org/records/10005796

  • [Remarks] Implementation of Partial Rank Similarity

    • URL

      https://github.com/nii-yamagishilab/partial_rank_similarity

  • [Remarks] VoiceMOS Challenge 2023 Homepage

    • URL

      https://voicemos-challenge-2023.github.io

URL: 

Published: 2024-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi