Text-to-speech with verbal instructions - realization of a directable virtual voice actor system

Research Project

Project/Area Number	24K21322
Research Category	Grant-in-Aid for Challenging Research (Pioneering)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	Nagoya Institute of Technology
Principal Investigator	徳田恵一名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)
Co-Investigator(Kenkyū-buntansha)	橋本佳名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10635907) 南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497)
Project Period (FY)	2024-06-28 – 2028-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥25,610,000 (Direct Cost: ¥19,700,000、Indirect Cost: ¥5,910,000) Fiscal Year 2027: ¥5,200,000 (Direct Cost: ¥4,000,000、Indirect Cost: ¥1,200,000) Fiscal Year 2026: ¥6,240,000 (Direct Cost: ¥4,800,000、Indirect Cost: ¥1,440,000) Fiscal Year 2025: ¥6,630,000 (Direct Cost: ¥5,100,000、Indirect Cost: ¥1,530,000) Fiscal Year 2024: ¥7,540,000 (Direct Cost: ¥5,800,000、Indirect Cost: ¥1,740,000)
Keywords	テキスト音声合成 / インタフェース / プロンプティング
Outline of Research at the Start	深層学習の導入によりテキスト音声合成の品質が向上し、更には感情表現等の多様な発話スタイルの実現も容易となってきた。しかし、生成したい音声の発話スタイルを指定するインタフェースをどのように構成するかという新たな問題が浮上してきている。本研究では、画像生成AIのプロンプティングに着想を得て、声優に指示するように自然な言葉で発話スタイル等を指示できる音声合成システムを構築することを目指す。

Report

(1 results)

2024 Comments on the Screening Results