Everyday conversation speech synthesis

Research Project

Project/Area Number	22K12107
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61020:Human interface and interaction-related
Research Institution	Utsunomiya University
Principal Investigator	森大毅宇都宮大学, 工学部, 准教授 (10302184)
Co-Investigator(Kenkyū-buntansha)	有本泰子千葉工業大学, 情報科学部, 准教授 (60586957)
Project Period (FY)	2022-04-01 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2024: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2023: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2022: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
Keywords	自発音声 / 会話音声 / 会話音声合成 / 韻律
Outline of Research at the Start	深層学習の登場により、もはや人間の肉声と区別することができないほど高品質な合成音声が実現されている。しかし、既存の音声合成システム構築のために使用されている音声データは指定したテキストを読み上げさせたものであり、会話音声とは質的に全く異なる。人と機械の音声コミュニケーションを人同士のそれに近づけるためには、会話音声の持つ特質を持った音声を合成する技術が必要である。本研究は、日本語日常会話コーパスという大規模会話コーパスの有効利用によって、これまでの合成音声とは全く異なる、日常会話に見られるような音声の合成を実現しようとする試みである。
Outline of Annual Research Achievements	本研究の目的は、日本語日常会話コーパス(CEJC)を利用した高品質な会話音声の合成である。End-to-end音声合成をCEJCのような録音品質が悪いコーパスに適用すると、その悪い音をそのままモデル化してしまう。本研究では、CEJCを韻律モデルの学習のみに使用しつつ、別の高品質な音声コーパスを併用してスペクトルモデルを学習することで、読み上げ音声の合成と同等な品質を保ちながら会話音声の韻律を有する音声合成を目指している。令和5年度は、end-to-end音声合成による韻律とスペクトルの重層モデリングの検討を引き続き実施した。FastSpeech 2を単純にCEJCで学習したモデルを初期状態とし、variance adaptor (fo, 強度、継続時間予測器)の重み更新を停止しつつ高品質な音声コーパスであるJSUTを用いてファインチューニングしたハイブリッドモデルを学習した。合成音声を聴き比べたところ、ハイブリッドモデルはCEJCモデルと比べて雑音の少ない音声を合成することができた。ハイブリッドモデルにより日常会話音声らしさを保ちつつ音質改善ができることを、CEJCのみで学習したモデルおよびJSUTのみで学習したモデルと比較する聴取実験により確認した。20代の10人を対象とした評価実験の結果、日常会話音声らしさはCEJCモデルとHybridモデルがJSUTに比べ高く評価された。また、HybridモデルではCEJCモデルと同程度の日常会話音声らしさを保ちつつ、CEJCモデルよりも明瞭な音声が合成できることが示された。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason スペクトル等、韻律以外の音声の特徴を読み上げ音声コーパスから転移学習するための方法について、FastSpeech 2のvariance adaptorの重み更新を止めたファインチューニングが有効であることを見出し、会話音声の韻律を有する音声合成の品質向上を達成できた。
Strategy for Future Research Activity	[会話音声らしさに関与する韻律ラベルの重要性の検討] End-to-endモデルはテキストと音声波形との関係を直接モデル化するものであるが、会話音声において重要なパラ言語情報、すなわちテキストにすると欠落する情報に関与する韻律の多様性はモデル化できない。そこで、CEJCの一部の会話に含まれる韻律ラベル情報をテキストに追加して学習したモデルの合成音声を従来のものと比較することで、韻律ラベルが会話音声合成において本質的か否かの検討を行う。 [聞き手反応の合成] 会話においては、相槌や感情表出系感動詞の果たす役割が大きい。しかし、FastSpeech 2ではこれらの短い発話の合成音声の品質が低く、またパラ言語的多様性の再現ができない。そこで、Global Style Tokenを用いてパラ言語埋め込みを教師なしで抽出するとともに、聞き手反応の合成に特化したモジュールを作成し、他者発話コンテキストで条件付けることでパラ言語的多様性の再現を図る。

Report

(2 results)

2023 Research-status Report
2022 Research-status Report

Research Products
(30 results)

All 2025 2024 2023 2022

All Journal Article (13 results) (of which Peer Reviewed: 10 results, Open Access: 13 results) Presentation (17 results) (of which Invited: 2 results)

[Journal Article] Determining the base frequency of the F0 contour generation model for the diverse expression of speech,” Acoustical Science and Technology2025
- Author(s)
  Yoshiko Arimoto, Yasuo Horiuchi, Sumio Ohno
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 1
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Acoustic differences between laughter and screams in spontaneous dialog2024
- Author(s)
  Matsuda Takuto、Arimoto Yoshiko
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 45 Issue: 3 Pages: 135-146
- DOI
  10.1250/ast.e23.58
- ISSN
  0369-4232, 1346-3969, 1347-5177
- Year and Date
  2024-05-01
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Phonetic analysis on speech-laugh occurrence in spontaneous gaming dialog2023
- Author(s)
  Arimoto Yoshiko
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 44 Issue: 1 Pages: 36-39
- DOI
  10.1250/ast.44.36
- ISSN
  0369-4232, 1346-3969, 1347-5177
- Year and Date
  2023-01-01
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] A Generative Framework for Conversational Laughter: Its 'Language Model' and Laughter Sound Synthesis2023
- Author(s)
  Mori Hiroki、Kimura Shunya
- Journal Title
  
  Proceedings of Interspeech2023
  
  Volume: - Pages: 3372-3376
- DOI
  10.21437/interspeech.2023-2453
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Detection of Laughter and Screaming Using the Attention and CTC Models2023
- Author(s)
  Matsuda Takuto、Arimoto Yoshiko
- Journal Title
  
  Proceedings of Interspeech2023
  
  Volume: - Pages: 1025-1029
- DOI
  10.21437/interspeech.2023-1412
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Why, when, and how do we laugh?2022
- Author(s)
  森大毅
- Journal Title
  
  THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN
  
  Volume: 79 Issue: 1 Pages: 57-63
- DOI
  10.20697/jasj.79.1_57
- ISSN
  0369-4232, 2432-2040
- Year and Date
  2022-12-25
- Related Report
  2022 Research-status Report
- Open Access
[Journal Article] Use or produce?: Corpus construction for affective speech analysis2022
- Author(s)
  有本泰子
- Journal Title
  
  THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN
  
  Volume: 79 Issue: 1 Pages: 64-71
- DOI
  10.20697/jasj.79.1_64
- ISSN
  0369-4232, 2432-2040
- Year and Date
  2022-12-25
- Related Report
  2022 Research-status Report
- Open Access
[Journal Article] Comparison of machine learning algorithms and acoustic features in emotion recognition from spontaneous speech2022
- Author(s)
  Takahisa Iizuka, Hiroki Mori
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 43 Issue: 4 Pages: 228-231
- DOI
  10.1250/ast.43.228
- ISSN
  0369-4232, 1346-3969, 1347-5177
- Year and Date
  2022-07-01
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] How should a dialog system speak? Implications for speech synthesis from real conversations2022
- Author(s)
  森大毅
- Journal Title
  
  THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN
  
  Volume: 78 Issue: 5 Pages: 283-288
- DOI
  10.20697/jasj.78.5_283
- ISSN
  0369-4232, 2432-2040
- Year and Date
  2022-05-01
- Related Report
  2022 Research-status Report
- Open Access
[Journal Article] Laughter Components Estimation Using Emotional Information towards Natural and Expressive Laughter Synthesis2022
- Author(s)
  有本泰子、今西利於、森大毅
- Journal Title
  
  情報処理学会論文誌
  
  Volume: 63 Issue: 4 Pages: 1159-1169
- DOI
  10.20729/00217618
- Year and Date
  2022-04-15
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] How does a spontaneously speaking conversational agent affect user behavior?2022
- Author(s)
  Takahisa Iizuka, Hiroki Mori
- Journal Title
  
  IEEE Access
  
  Volume: 10 Pages: 111042-111051
- DOI
  10.1109/access.2022.3214977
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Neural conversational speech synthesis with flexible control of emotion dimensions2022
- Author(s)
  Hiroki Mori, Hironao Nishino
- Journal Title
  
  Proc. APSIPA ASC 2022
  
  Volume: － Pages: 432-436
- DOI
  10.23919/apsipaasc55919.2022.9980105
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Acoustic discriminability of unconscious laughter and scream during game-play2022
- Author(s)
  Matsuda Takuto、Arimoto Yoshiko
- Journal Title
  
  Proc. Speech Prosody 2022
  
  Volume: － Pages: 575-579
- DOI
  10.21437/speechprosody.2022-117
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Presentation] 音声合成用コーパスおよび日常会話コーパスのハイブリッドモデリングによる日常会話音声の合成2024
- Author(s)
  古川晃大，森大毅
- Organizer
  日本音響学会2024年春季研究発表会
- Related Report
  2023 Research-status Report
[Presentation] 日常会話コーパスを利用したspeech-laughの冒頭音素の対話ドメイン比較2024
- Author(s)
  有本泰子, 神津宏尚
- Organizer
  国立国語研究所「日常会話コーパス」シンポジウム IX,
- Related Report
  2023 Research-status Report
[Presentation] メルケプストラムを利用したspeech-laugh冒頭音素の音響分析2024
- Author(s)
  瀬戸口遼, 有本泰子
- Organizer
  日本音響学会2024年春季研究発表会講演論文集
- Related Report
  2023 Research-status Report
[Presentation] クラウドソーシングを利用した叫び声に対する感情次元評価2024
- Author(s)
  大石暖, 大久保港, 有本泰子
- Organizer
  日本音響学会2024年春季研究発表会講演論文集,
- Related Report
  2023 Research-status Report
[Presentation] 共起笑いの音響分析と生理反応への影響2024
- Author(s)
  飯田真広, 有本泰子
- Organizer
  日本音響学会音声研究会（ASJ-SP）資料
- Related Report
  2023 Research-status Report
[Presentation] wav2vec 2.0 を用いた笑い声・叫び声検出が可能な音声認識2024
- Author(s)
  松田匠翔, 有本泰子
- Organizer
  日本音響学会音声研究会（ASJ-SP）資料
- Related Report
  2023 Research-status Report
[Presentation] 会話エージェントは，いつ，どのように笑うべきか: ヒトの笑い声研究からの示唆2023
- Author(s)
  森大毅
- Organizer
  日本音響学会2023年秋季研究発表会
- Related Report
  2023 Research-status Report
- Invited
[Presentation] 笑い声に応答するゲームシステムの開発に向けたイベント呈示の生理学的評価2023
- Author(s)
  倉澤瑞, 福田樹人, 有本泰子
- Organizer
  人工知能学会言語・音声理解と対話処理研究会（SIG-SLUD）第99回研究会「第14回対話システムシンポジウム」
- Related Report
  2023 Research-status Report
[Presentation] speech-laugh の発生機序解明に向けた冒頭母音の音響分析2023
- Author(s)
  瀬戸口遼, 有本泰子
- Organizer
  日本音響学会2023年秋季研究発表会講演論文集
- Related Report
  2023 Research-status Report
[Presentation] 自発的な笑い声と叫び声が検出可能なEnd-to-End 音声認識の検討2023
- Author(s)
  松田匠翔, 有本泰子
- Organizer
  日本音響学会2023年秋季研究発表会講演論文集
- Related Report
  2023 Research-status Report
[Presentation] 情動発声研究のその先に:笑い声・叫び声の認識・合成，そしてインタラクション2023
- Author(s)
  有本泰子
- Organizer
  日本音響学会2023年秋季研究発表会講演論文集
- Related Report
  2023 Research-status Report
- Invited
[Presentation] 話者変換によるデータ拡張を利用した叫び声合成2023
- Author(s)
  白鳥恵大, 有本泰子
- Organizer
  日本音響学会2023年春季研究発表会講演論文集
- Related Report
  2022 Research-status Report
[Presentation] 笑い声合成における音声記号表現と音響特徴量の感情次元による制御2022
- Author(s)
  木村駿野, 森大毅
- Organizer
  日本音響学会2022年秋季研究発表会
- Related Report
  2022 Research-status Report
[Presentation] BiLSTM-CTC モデルを使用した自発的な笑い声と叫び声のEnd-to-End検出モデルの構築2022
- Author(s)
  松田匠翔, 有本泰子
- Organizer
  日本音響学会2022年秋季研究発表会講演論文集
- Related Report
  2022 Research-status Report
[Presentation] 感情知覚特性に基づいた自発的な叫び声の分類と音響的特徴量の関係2022
- Author(s)
  大久保港, 井岸渉, 有本泰子
- Organizer
  日本音響学会2022年秋季研究発表会講演論文集
- Related Report
  2022 Research-status Report
[Presentation] 自発対話音声に対する叫び声アノテーション2022
- Author(s)
  白鳥恵大, 大久保港, 松田匠翔, 有本泰子
- Organizer
  言語資源ワークショップ2022
- Related Report
  2022 Research-status Report
[Presentation] 様々な対話場面におけるspeech-laughの発生タイミングの分析2022
- Author(s)
  有本泰子, 真弓花
- Organizer
  言語資源ワークショップ2022
- Related Report
  2022 Research-status Report

Everyday conversation speech synthesis

Principal Investigator

森 大毅 宇都宮大学, 工学部, 准教授 (10302184)

¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Determining the base frequency of the F0 contour generation model for the diverse expression of speech,” Acoustical Science and Technology2025

Author(s)

Journal Title

Related Report

[Journal Article] Acoustic differences between laughter and screams in spontaneous dialog2024

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Phonetic analysis on speech-laugh occurrence in spontaneous gaming dialog2023

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] A Generative Framework for Conversational Laughter: Its 'Language Model' and Laughter Sound Synthesis2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Detection of Laughter and Screaming Using the Attention and CTC Models2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Why, when, and how do we laugh?2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Use or produce?: Corpus construction for affective speech analysis2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Comparison of machine learning algorithms and acoustic features in emotion recognition from spontaneous speech2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] How should a dialog system speak? Implications for speech synthesis from real conversations2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Laughter Components Estimation Using Emotional Information towards Natural and Expressive Laughter Synthesis2022

Author(s)

Journal Title

DOI

Year and Date

Related Report

[Journal Article] How does a spontaneously speaking conversational agent affect user behavior?2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Neural conversational speech synthesis with flexible control of emotion dimensions2022

Author(s)

Journal Title

DOI

Related Report

森大毅宇都宮大学, 工学部, 准教授 (10302184)