Can AI Rakugoka entertain people? -Improved expressiveness of rakugo speech synthesis and automatic generation of storytelling

Research Project

Project/Area Number	21K19808
Research Category	Grant-in-Aid for Challenging Research (Exploratory)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	National Institute of Informatics
Principal Investigator	Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)
Co-Investigator(Kenkyū-buntansha)	Cooper Erica 国立情報学研究所, コンテンツ科学研究系, 特任助教 (30843156)
Project Period (FY)	2021-07-09 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000) Fiscal Year 2022: ¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000) Fiscal Year 2021: ¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Keywords	音声合成 / 落語 / 深層学習 / 言語生成 / 音声情報処理 / 機械学習
Outline of Research at the Start	我々は伝統話芸である落語の実演データから深層学習モデルを学習、あたかもプロの噺家の様に、噺を読み上げる落語音声合成システムを最先端音声合成技術に基づき構築した。従来の音声対話システムとは目的が全く異なり、聞き手を楽しませるAI噺家の実現を目標としている。本課題では、長期的音響情報および非言語情報の明示的モデル化により合成音声の表現力を向上させ、ニューラル言語モデルによる噺の自動生成に取り組む。
Outline of Final Research Achievements	We have conducted machine learning research to construct a DNN-based rakugo performer’s speech synthesis model, which can generate natural-sounding audio that entertains listeners by performing rakugo like a professional performer. First, we constructed speech synthesis models called Tacotron, Transformer, VITS, and FastPitch on our rakugo database. We also developed an explicit modeling method for nonverbal information such as laughter, which is frequently used in rakugo, and proposed a new method that uses the approximate shape of speech waveforms as input units. Furthermore, since it is impossible to entertain listeners if rakugo stories are exactly the same every time, we also studied a framework for automatic generation of rakugo stories using neural language models such as GPT-2, BART, and T5.
Academic Significance and Societal Importance of the Research Achievements	伝統話芸である落語を深層学習で再現し、AI噺家を実現しようと言う、本研究の試み自体が、情報伝達や質問回答を目的とする従来の音声対話システムとは目的が全く異なり、ユニークでかつ学術的意義のある試みである。構築された音声合成システムの比較実験からは、AI噺家が人を楽しませるためには、従来の音声合成の自然性に関する評価指標のみでは解決できない事も判明し、音声合成のモデリングのみならず評価体系を抜本的に変化させる必要があることも判明した。また同時に、Tacotron、 Transformer、FastPitchという種々のEnd-to-end音声合成モデルの中でどれが落語音声に適しているかも判明した。

Report

(3 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report

Research Products
(4 results)

All 2022 Other

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (2 results) (of which Int'l Joint Research: 1 results, Invited: 2 results) Remarks (1 results)

[Journal Article] Generalization Ability of MOS Prediction Networks2022
- Author(s)
  Cooper Erica、Huang Wen-Chin、Toda Tomoki、Yamagishi Junichi
- Journal Title
  
  ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  
  Volume: - Pages: 8442-8446
- DOI
  10.1109/icassp43922.2022.9746395
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] Speech Synthesis Research 2.02022
- Author(s)
  Junichi Yamagishi
- Organizer
  34TH CONFERENCE ON COMPUTATIONAL LINGUISTICS AND SPEECH PROCESSING (Rocling 2022), Taiwan
- Related Report
  2022 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] The VoiceMOS Challenge 20222022
- Author(s)
  Erica Cooper
- Organizer
  Special Interest Group on Spoken Language Processing, Information Processing Society of Japan
- Related Report
  2021 Research-status Report
- Invited
[Remarks] Synthesizing laughter from waveform silhouettes
- URL
  https://arxiv.org/abs/2110.04946
- Related Report
  2021 Research-status Report

Can AI Rakugoka entertain people? -Improved expressiveness of rakugo speech synthesis and automatic generation of storytelling

Principal Investigator

Yamagishi Junichi 国立情報学研究所, コンテンツ科学研究系, 教授 (70709352)

¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)

Report

Research Products

[Journal Article] Generalization Ability of MOS Prediction Networks2022

Author(s)

Journal Title

DOI

Related Report

[Presentation] Speech Synthesis Research 2.02022

Author(s)

Organizer

Related Report

[Presentation] The VoiceMOS Challenge 20222022

Author(s)

Organizer

Related Report

[Remarks] Synthesizing laughter from waveform silhouettes

URL

Related Report