• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Language-independent, multi-modal, and data-efficient approaches for speech synthesis and translation

Research Project

Project/Area Number 21K11951
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionNational Institute of Informatics

Principal Investigator

COOPER Erica  国立情報学研究所, コンテンツ科学研究系, 特任准教授 (30843156)

Co-Investigator(Kenkyū-buntansha) Kruengkrai Canasai  国立情報学研究所, コンテンツ科学研究系, 特任助教 (10895907)
Project Period (FY) 2021-04-01 – 2024-03-31
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2023: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2021: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
KeywordsText-to-speech synthesis / Low-resource languages / Neural network pruning / Evaluation / text-to-speech synthesis / low-resource languages / speech evaluation / speech synthesis / self-supervised learning / speech assessment / mean opinion score / text-to-speech / vocoder / pruning / efficiency / multi-lingual / machine translation / deep learning / neural networks
Outline of Research at the Start

Language technology has improved due to advances in neural-network-based approaches; for example, speech synthesis has reached the quality of human speech. However, neural models require large quantities of data. Speech technologies bring social benefits of accessibility and communication - to ensure broad access to these benefits, we consider language-independent methods that can make use of less data. We propose 1) articulatory class based end-to-end speech synthesis; 2) multi-modal machine translation with text and speech; and 3) neural architecture search for data-efficient architectures.

Outline of Final Research Achievements

We explored pruning for lightweight text-to-speech synthesis (TTS), developed data-efficient TTS for low-resource languages, and advanced the field of automatic quality prediction for TTS. We found that up to 90% of TTS model weights can be pruned without reducing output quality. We developed a data processing pipeline for building TTS corpora for low-resource languages using podcast data, resulting in a large-scale, high-quality, publicly-available dataset. We also developed a TTS system using this data that can be repurposed for any language with similar data. As self-supervised speech representations have been effective for many downstream tasks, we next investigated these as an intermediate representation for TTS trained on multilingual data, which can be fine-tuned to a new language. Finally, we identified automatic evaluation of TTS as a critical topic. We launched a series of challenges for this task in 2022 and 2023 which attracted many participants and advanced the field.

Academic Significance and Societal Importance of the Research Achievements

We developed TTS trainable on small amounts of data and lightweight TTS models. We also advanced the field of TTS evaluation. This benefits researchers and society by reducing barriers of entry to creating TTS for low-resource languages, expanding accessibility benefits of TTS to a broader audience.

Report

(4 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Research-status Report
  • 2021 Research-status Report
  • Research Products

    (28 results)

All 2024 2023 2022 Other

All Int'l Joint Research (8 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results,  Open Access: 1 results) Presentation (12 results) (of which Int'l Joint Research: 9 results,  Invited: 3 results) Remarks (7 results)

  • [Int'l Joint Research] up.ai(イスラエル)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] University of Edinburgh(英国)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] Academia Sinica(その他の国・地域 Taiwan)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] National Research Council(カナダ)

    • Related Report
      2023 Annual Research Report
  • [Int'l Joint Research] Academia Sinica(その他の国・地域 Taiwan)

    • Related Report
      2022 Research-status Report
  • [Int'l Joint Research] National Research Council(カナダ)

    • Related Report
      2022 Research-status Report
  • [Int'l Joint Research] University of Edinburgh(英国)

    • Related Report
      2022 Research-status Report
  • [Int'l Joint Research] Massachusetts Institute of Technology/MIT-IBM Watson AI Lab(米国)

    • Related Report
      2021 Research-status Report
  • [Journal Article] A review on subjective and objective evaluation of synthetic speech2024

    • Author(s)
      Cooper Erica、Huang Wen-Chin、Tsao Yu、Wang Hsin-Min、Toda Tomoki、Yamagishi Junichi
    • Journal Title

      Acoustical Science and Technology

      Volume: 45 Issue: 4 Pages: 161-183

    • DOI

      10.1250/ast.e24.12

    • ISSN
      0369-4232, 1346-3969, 1347-5177
    • Year and Date
      2024-07-01
    • Related Report
      2023 Annual Research Report
    • Open Access / Int'l Joint Research
  • [Presentation] Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction2024

    • Author(s)
      Aditya Ravuri, Erica Cooper, Junichi Yamagishi
    • Organizer
      IEEE ICASSP 2024 workshop on Self-supervision in Audio, Speech and Beyond
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] SASPEECH: A Hebrew Single Speaker Dataset for Text to Speech and Voice Conversion2023

    • Author(s)
      Orian Sharoni, Roee Shenberg, Erica Cooper
    • Organizer
      Interspeech 2023
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech2023

    • Author(s)
      Erica Cooper, Junichi Yamagishi
    • Organizer
      Interspeech 2023
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting2023

    • Author(s)
      Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah
    • Organizer
      ASRU 2023
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains2023

    • Author(s)
      Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
    • Organizer
      ASRU 2023
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Generalization Ability of MOS Prediction Networks2022

    • Author(s)
      Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi
    • Organizer
      ICASSP 2022
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech2022

    • Author(s)
      Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda
    • Organizer
      ICASSP 2022
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] The VoiceMOS Challenge 20222022

    • Author(s)
      Wen-Chin Huang, Erica Cooper, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi
    • Organizer
      Interspeech 2022
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] The VoiceMOS Challenge: Data-Driven Mean Opinion Score Prediction for Synthesized Speech2022

    • Author(s)
      Erica Cooper
    • Organizer
      2022 Autumn Meeting of the Acoustical Society of Japan
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] Objective Evaluation in TTS2022

    • Author(s)
      Erica Cooper
    • Organizer
      KTH Seminar on Speech Synthesis Evaluation, KTH Royal Institute of Technology, Department of Speech, Music, and Hearing
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] The VoiceMOS Challenge 20222022

    • Author(s)
      Erica Cooper, Wen-Chin Huang
    • Organizer
      Special Interest Group on Spoken Language Processing, Information Processing Society of Japan
    • Related Report
      2022 Research-status Report
    • Invited
  • [Presentation] On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis2022

    • Author(s)
      Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass
    • Organizer
      ICASSP 2022
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Remarks] SASPEECH: Hebrew speech and transcripts for TTS

    • URL

      https://openslr.org/134/

    • Related Report
      2023 Annual Research Report
  • [Remarks] Listening test data for "Range-Equalizing Bias"

    • URL

      https://zenodo.org/records/10005796

    • Related Report
      2023 Annual Research Report
  • [Remarks] Implementation of Partial Rank Similarity

    • URL

      https://github.com/nii-yamagishilab/partial_rank_similarity

    • Related Report
      2023 Annual Research Report
  • [Remarks] VoiceMOS Challenge 2023 Homepage

    • URL

      https://voicemos-challenge-2023.github.io

    • Related Report
      2023 Annual Research Report
  • [Remarks] The VoiceMOS Challenge 2022 website

    • URL

      https://voicemos-challenge-2022.github.io

    • Related Report
      2022 Research-status Report
  • [Remarks] Open-source code for SSL-based MOS predictor

    • URL

      https://github.com/nii-yamagishilab/mos-finetune-ssl

    • Related Report
      2022 Research-status Report
  • [Remarks] TTS Pruning

    • URL

      https://people.csail.mit.edu/clai24/prune-tts/

    • Related Report
      2021 Research-status Report

URL: 

Published: 2021-04-28   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi