2023 Fiscal Year Annual Research Report

ビスポーク音声デザインの骨格形成と体系化

Research Project

Project/Area Number	21H04900
Research Institution	Meiji University
Principal Investigator	森勢将雅明治大学, 総合数理学部, 専任准教授 (60510013)
Co-Investigator(Kenkyū-buntansha)	田中章浩東京女子大学, 現代教養学部, 教授 (80396530) 齋藤大輔東京大学, 大学院工学系研究科(工学部), 准教授 (40615150) 高道慎之介東京大学, 大学院情報理工学系研究科, 講師 (90784330)
Project Period (FY)	2021-04-05 – 2025-03-31
Keywords	音声情報処理 / 音声合成 / 声質変換 / 音声知覚 / 音声デザイン
Outline of Annual Research Achievements	2023年度は，ビスポーク音声デザインのプロトタイプとして，点ピッチの考え方に基づき，モーラ単位でピッチ情報粗く制御できつつ自然な音声を生成可能なDNN音声合成を実装した．ピッチ制御についてはピッチ軌跡そのものをデザインすることも可能であるが，「ビスポーク」的なデザイン法として，ユーザーは大雑把なピッチ情報のみ与えることを重視している．合成音声の品質と目的のイントネーションを再現できるかについて主観評価を実施し，提案する方法が既存の方法よりも高い性能であることを確認した．次いで，音声デザインインタフェースの検討として，音声の分析からサポートするGUIについて検討を進めた．このGUIは，音声研究の専門家に向けて全てのパラメータを細かくチューニングできるものではなく，音声加工の初心者が試行錯誤的にピッチなどを加工できることをコンセプトとしている．様々な環境で多くの被験者に対する利用実績とフィードバックに基づいて改良を加えたものを暫定的なインタフェースとして採用した．本GUIについては学術論文として採録された．最後に，新たな音声評価法について検討した．現時点での音声合成の音質評価では，多くの論文でMOS (mean opinion score)が採用されている．MOSによる主観評価では音質差の検出力が低いため，特に合成音声の品質向上が著しく昨今では膨大な被験者数に基づく評価事例が増えつつあり，評価にかかるコストが増大している．そこで本課題では，新たに高品質な合成音声に対し，MOSよりも少人数で差の検出が可能な方法を提案した．同じ音声群を用いた評価をMOSと提案法とで実施し，同人数の評価結果であれば提案法のほうが差をより顕著に検出できることを確認した．上記が代表的な成果であるが，合成音声，音声コーパス作成，声質変換，音声知覚実験に関する成果報告も多数実施してきた．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 目標とする音声デザインの技術的な課題について，ピッチ情報について最低限の条件をクリアしたプロトタイプの構築を完了した．音声デザインに向けたインタフェース開発も論文として採録されており，当初定めた目標は順調に達成しつつある．加えて，コーパス構築，声質変換，音声知覚評価など多面的な取り組みを実施し，それぞれについて学会で発表するなど幅広く研究を進めていると考えている．
Strategy for Future Research Activity	本プロジェクトは2024年度が最終年度であるため，これまで検討してきた様々な内容について，国際会議発表や学術論文として採録されることを目指す．具体的に，ビスポーク音声デザインのプロトタイプは簡単な評価を実施しているにとどまっており，学会でプロトタイプを発表するのみである．同様に，他の検討内容も学術論文に至っていないものが複数ある．学術論文にするためには新たに大規模な主観評価実験を必要とするため，本年度の序盤は主観評価を中心に実施し，査読付きの国際会議や学術論文に投稿し，採録されることを目指す．

Research Products
(42 results)

All 2024 2023 Other

All Journal Article (11 results) (of which Peer Reviewed: 10 results, Open Access: 6 results) Presentation (29 results) (of which Int'l Joint Research: 4 results, Invited: 1 results) Remarks (2 results)

[Journal Article] Interactive tools for making vocoder-based signal processing accessible: Flexible manipulation of speech attributes for explorational research and education2024
- Author(s)
  Kawahara Hideki、Morise Masanori
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 45 Pages: 48～51
- DOI
  10.1250/ast.e23.52
- Peer Reviewed / Open Access
[Journal Article] ヒューマンコミュニケーション研究から見る未来のかたち2024
- Author(s)
  新井田統、小森智康、酒向慎司、田中章浩、布川清彦
- Journal Title
  
  電子情報通信学会誌
  
  Volume: 107 Pages: 237～243
[Journal Article] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence2024
- Author(s)
  Luo Xuan、Takamichi Shinnosuke、Saito Yuki、Koriyama Tomoki、Saruwatari Hiroshi
- Journal Title
  
  APSIPA Transactions on Signal and Information Processing
  
  Volume: 13 Pages: 1～30
- DOI
  10.1561/116.00000242
- Peer Reviewed / Open Access
[Journal Article] Parameter representation of group delay towards glottal-flow-based phase manipulation for channel vocoder2023
- Author(s)
  Koguchi Junya、Morise Masanori、Kawahara Hideki
- Journal Title
  
  Acoustical Science and Technology
  
  Volume: 44 Pages: 189～192
- DOI
  10.1250/ast.44.189
- Peer Reviewed / Open Access
[Journal Article] Effects of Humans’ and Robots’ Multisensory Emotional Expressions by Body language and Voice on Human Altruistic Behavior2023
- Author(s)
  SAWADA Yoshiko、KAWAHARA Misako、TANAKA Akihiro
- Journal Title
  
  Transactions of Japan Society of Kansei Engineering
  
  Volume: 22 Pages: 405～416
- DOI
  10.5057/jjske.TJSKE-D-23-00024
- Peer Reviewed / Open Access
[Journal Article] COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control2023
- Author(s)
  Watanabe Aya、Takamichi Shinnosuke、Saito Yuki、Nakata Wataru、Xin Detai、Saruwatari Hiroshi
- Journal Title
  
  Proc. ASRU
  
  Volume: － Pages: 1～8
- DOI
  10.1109/ASRU57964.2023.10389693
- Peer Reviewed / Open Access
[Journal Article] HumanDiffusion: diffusion model using perceptual gradients2023
- Author(s)
  Ueda Yota、Takamichi Shinnosuke、Saito Yuki、Takamune Norihiro、Saruwatari Hiroshi
- Journal Title
  
  Proc. INTERSPEECH 2023
  
  Volume: － Pages: 4264～4268
- DOI
  10.21437/Interspeech.2023-1680
- Peer Reviewed / Open Access
[Journal Article] jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus2023
- Author(s)
  Nakamura Tomohiko、Takamichi Shinnosuke、Tanji Naoko、Fukayama Satoru、Saruwatari Hiroshi
- Journal Title
  
  Proc. ICASSP 2023
  
  Volume: － Pages: 1～5
- DOI
  10.1109/ICASSP49357.2023.10095569
- Peer Reviewed
[Journal Article] MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models2023
- Author(s)
  Watanabe Aya、Takamichi Shinnosuke、Saito Yuki、Xin Detai、Saruwatari Hiroshi
- Journal Title
  
  Proc. ICASSP 2023
  
  Volume: － Pages: 1～5
- DOI
  10.1109/ICASSP49357.2023.10097113
- Peer Reviewed
[Journal Article] Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images2023
- Author(s)
  Ohnaka Hien、Takamichi Shinnosuke、Imoto Keisuke、Okamoto Yuki、Fujii Kazuki、Saruwatari Hiroshi
- Journal Title
  
  Proc. ICASSP 2023
  
  Volume: － Pages: 1～5
- DOI
  10.1109/ICASSP49357.2023.10096517
- Peer Reviewed
[Journal Article] VTTS: Visual-Text To Speech2023
- Author(s)
  Nakano Yoshifumi、Saeki Takaaki、Takamichi Shinnosuke、Sudoh Katsuhito、Saruwatari Hiroshi
- Journal Title
  
  Proc. SLT 2023
  
  Volume: － Pages: 936～942
- DOI
  10.1109/SLT54892.2023.10022739
- Peer Reviewed
[Presentation] jMARS Recorder: コーパス朗読に特化した音声収録アプリの制作と検討2024
- Author(s)
  俣野文義
- Organizer
  日本音響学会2024年春季研究発表会
[Presentation] 統計的基本周波数推定のためのデータ拡張の検討2024
- Author(s)
  小口純矢
- Organizer
  日本音響学会2024年春季研究発表会
[Presentation] 音声モーフィングにおける自動対応付けの提案と品質評価2024
- Author(s)
  堀部貴紀
- Organizer
  日本音響学会2024年春季研究発表会
[Presentation] 対照学習モデルによる音声-声質表現文の埋め込み表現獲得2024
- Author(s)
  渡邊亞椰
- Organizer
  日本音響学会2024年春季研究発表会
[Presentation] 話者性を制御可能な音声合成のための話者埋め込み空間に関する実験的検討2024
- Author(s)
  森田湧大
- Organizer
  電子情報通信学会音声研究会
[Presentation] テキストのない音声に対する自己教師あり学習モデルによる音声合成の分析～多言語活用を中心に～2024
- Author(s)
  朴浚鎔
- Organizer
  電子情報通信学会音声研究会
[Presentation] 音源波形状に着目した地声-裏声変換と音源波パラメータの制御2024
- Author(s)
  岡田翔太
- Organizer
  電子情報通信学会音声研究会
[Presentation] 音声エージェントの印象に合致する音声の加工強度の予測2024
- Author(s)
  宮本蓮
- Organizer
  電子情報通信学会音声研究会
[Presentation] ChatGPTを活用した日本語コーパス文生成の基礎検討2023
- Author(s)
  石川真大
- Organizer
  情報処理学会音楽情報科学研究会
[Presentation] DNN音声合成による嫌悪感情の表現と基礎評価2023
- Author(s)
  俣野文義
- Organizer
  情報処理学会音楽情報科学研究会
[Presentation] モーラ単位で高さを制御可能な音声デザインを前提とした日本語テキスト音声合成システムの試作2023
- Author(s)
  森勢将雅
- Organizer
  情報処理学会音楽情報科学研究会
[Presentation] 日本語嫌悪感情音声の音響特徴量解析2023
- Author(s)
  俣野文義
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] リファレンスを必要としない相対的な音質評価に向けたMUSHRA法の改良について2023
- Author(s)
  田鎖佑弥
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] Coco-Nut: 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット2023
- Author(s)
  渡邊亞椰
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] Are There Associations Between Voice and Color?2023
- Author(s)
  M. Kuwa
- Organizer
  SARMAC XIV
- Int'l Joint Research
[Presentation] Multisensory emotion perception and its cultural differences2023
- Author(s)
  A. Tanaka
- Organizer
  Cognitive Psychology Colloquium at Leiden University
- Int'l Joint Research / Invited
[Presentation] Immigration Modulates Audiovisual Emotion Integration in Adults: The Effect of the Host Culture and Migration itself2023
- Author(s)
  A. K. Nakamura
- Organizer
  The 21st International Multisensory Research Forum
- Int'l Joint Research
[Presentation] The Prototypical Expressions Can Facilitate the Perception of Various Positive Emotions through Face, Voice, and Touch2023
- Author(s)
  R. Oya
- Organizer
  The 21st International Multisensory Research Forum
- Int'l Joint Research
[Presentation] アンドロイドロボットによる身体・音声表現からの高次感情の知覚2023
- Author(s)
  山本寿子
- Organizer
  日本認知科学会第40回大会
[Presentation] Introduction of International Society for Research on Emotion (ISRE)2023
- Author(s)
  A. Tanaka
- Organizer
  2023年度人工知能学会全国大会
[Presentation] 薬局における男性薬剤師の身だしなみが患者の信頼感に与える影響2023
- Author(s)
  高橋利供
- Organizer
  日本社会薬学会第41年会
[Presentation] 音声合成システムの入力表現に関する分析的検討2023
- Author(s)
  朴浚鎔
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] 表現力の異なる話者埋め込み空間と主観的話者間類似度の比較2023
- Author(s)
  森田湧大
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] Integration of Throat Microphone Recording and Bandwidth Extension for Robust Assesment of L2 Speech2023
- Author(s)
  Yu Xu
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] Emotion transfer with controllable intensity for emotional speech synthesis based on self-supervised model2023
- Author(s)
  Wei Li
- Organizer
  日本音響学会2023年秋季研究発表会
[Presentation] 知覚的話者間類似度との関連に着目した話者埋め込み空間の構成法の比較検討2023
- Author(s)
  森田湧大
- Organizer
  電子情報通信学会音声研究会研究報告
[Presentation] Improvement of Tacotron2 text-to-speech model based on masking operation and positional attention mechanism2023
- Author(s)
  Tong Ma
- Organizer
  電子情報通信学会音声研究会
[Presentation] Integration of Throat Microphone Recording and Bandwidth Extension for Robust Assessment of L2 Listening2023
- Author(s)
  Yu Xu
- Organizer
  電子情報通信学会音声研究会
[Presentation] Self-supervised learning model based emotion transfer and intensity control technology for expressive speech synthesis2023
- Author(s)
  Wei Li
- Organizer
  電子情報通信学会音声研究会
[Remarks] 自由記述文による声質制御に向けた多話者音声・声質自由記述ペアデータセット
- URL
  https://sites.google.com/site/shinnosuketakamichi/research-topics/coconut_corpus
[Remarks] jaCappellaコーパス
- URL
  https://tomohikonakamura.github.io/jaCappella_corpus/

2023 Fiscal Year Annual Research Report

ビスポーク音声デザインの骨格形成と体系化

Principal Investigator

森勢 将雅 明治大学, 総合数理学部, 専任准教授 (60510013)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Interactive tools for making vocoder-based signal processing accessible: Flexible manipulation of speech attributes for explorational research and education2024

Author(s)

Journal Title

DOI

[Journal Article] ヒューマンコミュニケーション研究から見る未来のかたち2024

Author(s)

Journal Title

[Journal Article] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence2024

Author(s)

Journal Title

DOI

[Journal Article] Parameter representation of group delay towards glottal-flow-based phase manipulation for channel vocoder2023

Author(s)

Journal Title

DOI

[Journal Article] Effects of Humans’ and Robots’ Multisensory Emotional Expressions by Body language and Voice on Human Altruistic Behavior2023

Author(s)

Journal Title

DOI

[Journal Article] COCO-NUT: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-Based Control2023

Author(s)

Journal Title

DOI

[Journal Article] HumanDiffusion: diffusion model using perceptual gradients2023

Author(s)

Journal Title

DOI

[Journal Article] jaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus2023

Author(s)

Journal Title

DOI

[Journal Article] MID-Attribute Speaker Generation Using Optimal-Transport-Based Interpolation of Gaussian Mixture Models2023

Author(s)

Journal Title

DOI

[Journal Article] Visual Onoma-to-Wave: Environmental Sound Synthesis from Visual Onomatopoeias and Sound-Source Images2023

Author(s)

Journal Title

DOI

[Journal Article] VTTS: Visual-Text To Speech2023

Author(s)

Journal Title

DOI

[Presentation] jMARS Recorder: コーパス朗読に特化した音声収録アプリの制作と検討2024

Author(s)

Organizer

[Presentation] 統計的基本周波数推定のためのデータ拡張の検討2024

Author(s)

Organizer

[Presentation] 音声モーフィングにおける自動対応付けの提案と品質評価2024

Author(s)

Organizer

[Presentation] 対照学習モデルによる音声-声質表現文の埋め込み表現獲得2024

Author(s)

Organizer

[Presentation] 話者性を制御可能な音声合成のための話者埋め込み空間に関する実験的検討2024

Author(s)

Organizer

[Presentation] テキストのない音声に対する自己教師あり学習モデルによる音声合成の分析 ～多言語活用を中心に～2024

Author(s)

Organizer

[Presentation] 音源波形状に着目した地声-裏声変換と音源波パラメータの制御2024

Author(s)

Organizer

[Presentation] 音声エージェントの印象に合致する音声の加工強度の予測2024

Author(s)

Organizer

[Presentation] ChatGPTを活用した日本語コーパス文生成の基礎検討2023

Author(s)

Organizer

[Presentation] DNN音声合成による嫌悪感情の表現と基礎評価2023

Author(s)

Organizer

森勢将雅明治大学, 総合数理学部, 専任准教授 (60510013)

[Presentation] テキストのない音声に対する自己教師あり学習モデルによる音声合成の分析～多言語活用を中心に～2024