Universal, Explainable and Extensible Automatic Evaluation of Synthesized Speech

Research Project

Project/Area Number	25K00143
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	National Institute of Information and Communications Technology
Principal Investigator	Cooper Erica 国立研究開発法人情報通信研究機構, ユニバーサルコミュニケーション研究所先進的音声翻訳研究開発推進センター, 主任研究員 (30843156)
Co-Investigator(Kenkyū-buntansha)	HUANG WENCHIN 名古屋大学, 情報学研究科, 助教 (91002385)
Project Period (FY)	2025-04-01 – 2029-03-31
Project Status	Granted (Fiscal Year 2025)
Budget Amount *help	¥18,720,000 (Direct Cost: ¥14,400,000、Indirect Cost: ¥4,320,000) Fiscal Year 2028: ¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000) Fiscal Year 2027: ¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000) Fiscal Year 2026: ¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000) Fiscal Year 2025: ¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)
Keywords	speech evaluation / speech synthesis
Outline of Research at the Start	Listening tests to evaluate speech synthesizers are time-consuming, costly, and a significant bottleneck to experimental evaluation, and better evaluation methodologies for synthesized speech are needed. To this end, we plan 1) collection of new datasets for automatic speech synthesis quality estimation; 2) development of quality predictors that consider explainability, context, and generalizability; 3) the continuation of international shared-task challenges on speech quality estimation.