2023 Fiscal Year Annual Research Report

Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation

Research Project

Project/Area Number	21K11951
Research Institution	National Institute of Informatics
Principal Investigator	Cooper Erica 国立情報学研究所, コンテンツ科学研究系, 特任准教授 (30843156)
Co-Investigator(Kenkyū-buntansha)	Kruengkrai Canasai 国立情報学研究所, コンテンツ科学研究系, 特任助教 (10895907) [Withdrawn]
Project Period (FY)	2021-04-01 – 2024-03-31
Keywords	text-to-speech synthesis / low-resource languages / speech evaluation
Outline of Annual Research Achievements	We developed methods for text-to-speech (TTS) synthesis for low-resource languages using smaller amounts of data as well as data from less traditional sources. First, we developed an approach to building text-to-speech (TTS) corpora from podcast data, using the Hebrew language as a case study, resulting in a publicly-available dataset. We next developed a data processing pipeline and TTS system that can be repurposed for other low-resource languages that have similar available data, resulting in one peer-reviewed publication at Interspeech 2023. Finally, we continued investigating self-supervised speech representations as an intermediate representation for multilingual TTS which can be fine-tuned to a new language. Having previously identified automatic evaluation of TTS as a critical issue especially for low-resource languages, we continued the VoiceMOS Challenge, a shared task for automatic TTS evaluation, by running a second edition focusing on zero-shot multi-domain scenarios. The challenge was presented as a special session at ASRU 2023, and attracted ten teams from academia and industry. We also studied contextual effects on listener ratings, self-supervised speech models' abilities for speech quality prediction, and a ranking-based quality prediction approach, resulting in three additional peer-reviewed publications.
Remarks	Various publicly-available datasets, open-source code repositories, and webpages related to the work conducted during this year of the project.

Research Products
(14 results)

All 2024 2023 Other

All Int'l Joint Research (4 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results, Open Access: 1 results) Presentation (5 results) (of which Int'l Joint Research: 5 results) Remarks (4 results)

[Int'l Joint Research] up.ai(イスラエル)
- Country Name
  ISRAEL
- Counterpart Institution
  up.ai
[Int'l Joint Research] University of Edinburgh(英国)
- Country Name
  UNITED KINGDOM
- Counterpart Institution
  University of Edinburgh
[Int'l Joint Research] Academia Sinica(その他の国・地域 Taiwan)
- Country Name
  その他の国・地域
- Counterpart Institution
  Academia Sinica
[Int'l Joint Research] National Research Council(カナダ)
- Country Name
  CANADA
- Counterpart Institution
  National Research Council
[Journal Article] A review on subjective and objective evaluation of synthetic speech2024
- Author(s)
  Cooper Erica、Huang Wen-Chin、Tsao Yu、Wang Hsin-Min、Toda Tomoki、Yamagishi Junichi
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: advpub Pages: 1-26
- DOI
  10.1250/ast.e24.12
- Open Access / Int'l Joint Research
[Presentation] Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction2024
- Author(s)
  Aditya Ravuri, Erica Cooper, Junichi Yamagishi
- Organizer
  IEEE ICASSP 2024 workshop on Self-supervision in Audio, Speech and Beyond
- Int'l Joint Research
[Presentation] SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion2023
- Author(s)
  Orian Sharoni, Roee Shenberg, Erica Cooper
- Organizer
  Interspeech 2023
- Int'l Joint Research
[Presentation] Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech2023
- Author(s)
  Erica Cooper, Junichi Yamagishi
- Organizer
  Interspeech 2023
- Int'l Joint Research
[Presentation] Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting2023
- Author(s)
  Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah
- Organizer
  ASRU 2023
- Int'l Joint Research
[Presentation] The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains2023
- Author(s)
  Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
- Organizer
  ASRU 2023
- Int'l Joint Research
[Remarks] SASPEECH: Hebrew speech and transcripts for TTS
- URL
  https://openslr.org/134/
[Remarks] Listening test data for "Range-Equalizing Bias"
- URL
  https://zenodo.org/records/10005796
[Remarks] Implementation of Partial Rank Similarity
- URL
  https://github.com/nii-yamagishilab/partial_rank_similarity
[Remarks] VoiceMOS Challenge 2023 Homepage
- URL
  https://voicemos-challenge-2023.github.io

2023 Fiscal Year Annual Research Report

Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation

Principal Investigator

Cooper Erica 国立情報学研究所, コンテンツ科学研究系, 特任准教授 (30843156)

Research Products

[Int'l Joint Research] up.ai(イスラエル)

Country Name

Counterpart Institution

[Int'l Joint Research] University of Edinburgh(英国)

Country Name

Counterpart Institution

[Int'l Joint Research] Academia Sinica(その他の国・地域 Taiwan)

Country Name

Counterpart Institution

[Int'l Joint Research] National Research Council(カナダ)

Country Name

Counterpart Institution

[Journal Article] A review on subjective and objective evaluation of synthetic speech2024

Author(s)

Journal Title

DOI

[Presentation] Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction2024

Author(s)

Organizer

[Presentation] SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion2023

Author(s)

Organizer

[Presentation] Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech2023

Author(s)

Organizer

[Presentation] Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting2023

Author(s)

Organizer

[Presentation] The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains2023

Author(s)

Organizer

[Remarks] SASPEECH: Hebrew speech and transcripts for TTS

URL

[Remarks] Listening test data for "Range-Equalizing Bias"

URL

[Remarks] Implementation of Partial Rank Similarity

URL

[Remarks] VoiceMOS Challenge 2023 Homepage

URL