Development of Speech Synthesis System for Controlling Speaker Identity through Text Prompts and Visual Interfaces

Research Project

Project/Area Number	23K20017
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Multi-year Fund
Review Section	1002:Human informatics, applied informatics and related fields
Research Institution	National Institute of Advanced Industrial Science and Technology
Principal Investigator	須田仁志国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60981438)
Project Period (FY)	2023-08-31 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000) Fiscal Year 2024: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2023: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Keywords	テキスト音声合成 / 感情音声合成 / 声質制御 / 生成AI
Outline of Research at the Start	音声合成システムを利用する際には，利用目的に応じた声質（話者らしさ）の選択が重要である．しかし従来手法では，声質は実在の話者に制約され，これを自在に制御することは難しい．本研究では，所望の声質でのテキスト音声合成を実現するため，プロンプト（声質を表現するテキスト）および視覚的インタフェースを通じて合成音声の声質を手軽に制御できる技術を開発する．構築した技術をWebインタフェースとして利用可能にし，合成音声の品質やユーザビリティなどの観点から本技術の有効性を多角的に評価する．