顔画像から声を再現するクロスモーダルボイスクローニング音声合成技術の研究

Research Project

Project/Area Number	24K02959
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61010:Perceptual information processing-related
Research Institution	Nagoya Institute of Technology
Principal Investigator	橋本佳名古屋工業大学, 工学(系)研究科(研究院), 准教授 (10635907)
Co-Investigator(Kenkyū-buntansha)	南角吉彦名古屋工業大学, 工学(系)研究科(研究院), 准教授 (80397497) 徳田恵一名古屋工業大学, 工学(系)研究科(研究院), 教授 (20217483)
Project Period (FY)	2024-04-01 – 2027-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥18,720,000 (Direct Cost: ¥14,400,000、Indirect Cost: ¥4,320,000) Fiscal Year 2026: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000) Fiscal Year 2025: ¥6,240,000 (Direct Cost: ¥4,800,000、Indirect Cost: ¥1,440,000) Fiscal Year 2024: ¥6,500,000 (Direct Cost: ¥5,000,000、Indirect Cost: ¥1,500,000)
Keywords	音声合成
Outline of Research at the Start	本研究では、音声・顔画像の関係をモデル化する技術や、顔画像から得られた情報に基づき多様な声質の音声を生成可能とする技術を確立することで、音声データがない場合においても顔画像からその人物の声を予測し、その人物の声を再現した音声合成システムを構築可能とするクロスモーダルボイスクローニング技術を確立する。本研究によって、事故などで自身の声を失ってしまった人物の声を、音声データを用いることなく再現し、自分の声による自然なコミュニケーションの実現を目指す。