Development of speech enhancement methods for conveying emotions equivalent to face-to-face communication

Research Project

Project/Area Number	19K20618
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 90010:Design-related
Research Institution	The University of Electro-Communications
Principal Investigator	Kishida Takuya 電気通信大学, 大学院情報理工学研究科, 研究員 (80827907)
Project Period (FY)	2019-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2021: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2020: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000) Fiscal Year 2019: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Keywords	音声信号処理 / 機械学習 / 声質変換 / 感情音声変換 / 音声印象変換 / ボルツマンマシン / 拡散確率モデル / 音響特徴量生成 / エネルギーベースモデル / マルチモーダル / 話者・音韻相互作用 / 系列表現 / 感情音声 / 感情知覚 / 対面コミュニケーション / 視聴覚相互作用 / 音声強調
Outline of Research at the Start	感情が様々な強度で表出した発話を撮影・録音し、心理実験によって感情知覚における視聴覚相互作用と音声符号化の影響を調べる。さらに心理実験の結果と音声の音響特徴量を合わせた多変量解析を行い、話者の感情知覚に結びつく音響特徴量を見出す。見出された音響特徴量を操作する音声信号処理によって、「感情が音声の音響的特徴とどのように結びつくのかを明らかにし、対面コミュニケーションと同等に感情を伝えるための音声の強調処理法を開発する」という目的を達する。
Outline of Final Research Achievements	In the context of speech communication using communication technologies, accurately conveying paralinguistic information such as emotions, intentions, attitudes, and speaker identities becomes challenging due to the absence of visual and other relevant cues. In this study, we developed a neural network capable of modeling the relationship between paralinguistic information and acoustic features of speech. Our research focused on exploring techniques to convert and enhance speaker identities and emotions. By employing the Boltzmann machine and related models, we were able to propose several approaches. These include a method that enables speaker identity conversion between individuals not included in the model's training, a method that concurrently converts speaker identities and emotions, and a method that decomposes voice into factors, allowing for voice impression conversion through factor manipulation.
Academic Significance and Societal Importance of the Research Achievements	本研究で得られた実験結果は、ボルツマンマシンやその関連手法が音声の音響特徴量と非言語情報との関係を表現するのに有効であることを示している。また、画像生成分野で目覚ましい成功を挙げている拡散確率モデルを声質変換課題に適用することに関する研究成果や調査結果は、音声コミュニケーションで声質変換技術をより柔軟に利用するための新たな手法の着想や知見につながった。

Report

(5 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report
2020 Research-status Report
2019 Research-status Report

Research Products
(31 results)

All 2023 2022 2021 2020 2019 Other

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (29 results) (of which Int'l Joint Research: 4 results) Remarks (1 results)

[Journal Article] Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion2020
- Author(s)
  KISHIDA Takuya、NAKASHIKA Toru
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E103.D Issue: 11 Pages: 2340-2350
- DOI
  10.1587/transinf.2020EDP7032
- NAID
  130007933848
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2020-11-01
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Presentation] 入力特徴量で条件づけた拡散確率モデルによるパラレル声質変換2023
- Author(s)
  岸田拓也, 中鹿亘
- Organizer
  日本音響学会音声研究会
- Related Report
  2022 Annual Research Report
[Presentation] 振幅重み付けエネルギー関数を用いたボルツマンマシンによる位相復元2023
- Author(s)
  羽賀洋克, 矢田部浩平, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2023年春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Dual Diffusion Implicit Bridgesを用いた話者間の匿名性を担保した声質変換2023
- Author(s)
  奥田耕平岸田拓也, 中鹿
- Organizer
  日本音響学会2023年春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Speechsplit を用いたイントネーション・リズム・発音の矯正による外国語アクセント変換2023
- Author(s)
  許誠, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2023年春季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] Non-parallel voice conversion based on free-energy minimization of speaker-conditional restricted boltzmann machine.2023
- Author(s)
  Kishida, T., & Nakashika, T.
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) IEEE
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Controllable voice conversion based on quantization of voice factor scores.2023
- Author(s)
  Isako, T., Onishi, K., Kishida, T., & Nakashika, T.
- Organizer
  Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) IEEE
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 条件付き制限ボルツマンマシンの平衡化傾向を利用したノンパラレル声質変換2022
- Author(s)
  岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] 話者因子係数の量子化に基づく声色制御可能な話者変換2022
- Author(s)
  井硲巧, 大西弘太郎, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年秋季研究発表会
- Related Report
  2022 Annual Research Report
[Presentation] F0適応ラグ窓を用いた音声分析系の精緻化2022
- Author(s)
  越森道貴, 嵯峨山茂樹, 岸田拓也, 中鹿亘
- Organizer
  音学シンポジウム2022
- Related Report
  2022 Annual Research Report
[Presentation] 制限ボルツマンマシンを用いた独立低ランク行列分析に基づくブラインド音源分離2022
- Author(s)
  古田翔太郎, 岸田拓也, 中鹿亘
- Organizer
  音学シンポジウム2022
- Related Report
  2022 Annual Research Report
[Presentation] LSP周波数間隔のクロスエントロピー誤差最小化に基づくVAE声質変換2022
- Author(s)
  平本佳弘, 嵯峨山茂樹, 岸田拓也, 中鹿亘
- Organizer
  音学シンポジウム2022
- Related Report
  2022 Annual Research Report
[Presentation] リズムスタイルを考慮したFader Networksに基づく外国語学習者の発音変換2022
- Author(s)
  王庭輝, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] TTSモデルにおけるアラインメントロバスト性向上のための非停滞化制約付きForward Attention2022
- Author(s)
  Zhou Yujin, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] 印象表現語ラベルを用いたFaderNetworksに基づく音声印象変換2022
- Author(s)
  岡留有希, 大西弘太郎, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] 時系列条件付きボルツマンマシンによる位相復元2022
- Author(s)
  羽賀洋克, 矢田部浩平, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] マルチモーダルVAEを用いた顔画像に基づく目標話者音声不要な声質変換2022
- Author(s)
  飯田紘崇, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2022年春季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] 深層エネルギーベースモデルによる音声の音響特徴量の生成2021
- Author(s)
  岸田拓也, 中鹿亘
- Organizer
  日本音響学会2021年秋季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] 話者依存度に応じた特徴抽出器によるdisentagleな声質変換2021
- Author(s)
  井硲巧, 岸田拓也, 中鹿亘
- Organizer
  日本音響学会2021年秋季研究発表会
- Related Report
  2021 Research-status Report
[Presentation] 話者特徴抽出器を加えたFaderNetVCによる未知話者声質変換2021
- Author(s)
  井硲巧，岸田拓也，中鹿亘
- Organizer
  音学シンポジウム2021
- Related Report
  2021 Research-status Report
[Presentation] Attention RBMによる音声特徴量系列の符号化と生成2021
- Author(s)
  岸田拓也，中鹿亘
- Organizer
  日本音響学会2020年秋季研究発表会
- Related Report
  2020 Research-status Report
[Presentation] Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM2020
- Author(s)
  Kishida, T., Tsukamoto, S., Nakashika, T.
- Organizer
  Interspeech 2020
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Cluster ARBM を用いた話者・音韻相互作用分類による声質変換2020
- Author(s)
  岸田拓也，中鹿亘
- Organizer
  日本音響学会2020年秋季研究発表会
- Related Report
  2020 Research-status Report
[Presentation] Speech chain を模倣したボルツマンマシンによるワンショット多対多声質変換の検討2020
- Author(s)
  岸田拓也、中鹿亘
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] マルチタスクモデルを用いたdisentangleな学習による楽器音変換2020
- Author(s)
  荒川賢也、岸田拓也、中鹿亘
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] 適応型 RBM を用いた音声情報の分離による話者と感情の同時変換2020
- Author(s)
  塚本伸、岸田拓也、中鹿亘
- Organizer
  日本音響学会2020年春季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] Speech chain VC: 音声コミュニケーションの言語-生理-音響連鎖を考慮する声質変換2019
- Author(s)
  岸田拓也、中鹿亘
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] Fader Networks を用いた楽器音変換2019
- Author(s)
  荒川賢也、岸田拓也、中鹿亘
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] 適応型 RBM を用いたノンパラレル感情音声変換2019
- Author(s)
  塚本伸、岸田拓也、中鹿亘
- Organizer
  日本音響学会2019年秋季研究発表会
- Related Report
  2019 Research-status Report
[Presentation] Acoustic analysis of word-initial consonant clusters: a perceptual basis of English syllables2019
- Author(s)
  Zhang, Y., Nakajima, Y., Yu, X., Remijn, G. B., Ueda, K., Kishida, T., & Elliott M. A.
- Organizer
  The 35th Annual Meeting of the International Society for Psychophysics
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Remarks] 岸田拓也 Takuya Kishida
- URL
  https://kishidatakuya0119.wixsite.com/mysite
- Related Report
  2019 Research-status Report

Development of speech enhancement methods for conveying emotions equivalent to face-to-face communication

Principal Investigator

Kishida Takuya 電気通信大学, 大学院情報理工学研究科, 研究員 (80827907)

¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)

Report

Research Products

[Journal Article] Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion2020

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Presentation] 入力特徴量で条件づけた拡散確率モデルによるパラレル声質変換2023

Author(s)

Organizer

Related Report

[Presentation] 振幅重み付けエネルギー関数を用いたボルツマンマシンによる位相復元2023

Author(s)

Organizer

Related Report

[Presentation] Dual Diffusion Implicit Bridgesを用いた話者間の匿名性を担保した声質変換2023

Author(s)

Organizer

Related Report

[Presentation] Speechsplit を用いたイントネーション・リズム・発音の矯正による 外国語アクセント変換2023

Author(s)

Organizer

Related Report

[Presentation] Non-parallel voice conversion based on free-energy minimization of speaker-conditional restricted boltzmann machine.2023

Author(s)

Organizer

Related Report

[Presentation] Controllable voice conversion based on quantization of voice factor scores.2023

Author(s)

Organizer

Related Report

[Presentation] 条件付き制限ボルツマンマシンの平衡化傾向を利用したノンパラレル声質変換2022

Author(s)

Organizer

Related Report

[Presentation] 話者因子係数の量子化に基づく声色制御可能な話者変換2022

Author(s)

Organizer

Related Report

[Presentation] F0適応ラグ窓を用いた音声分析系の精緻化2022

Author(s)

Organizer

Related Report

[Presentation] 制限ボルツマンマシンを用いた独立低ランク行列分析に基づくブラインド音源分離2022

Author(s)

Organizer

Related Report

[Presentation] LSP周波数間隔のクロスエントロピー誤差最小化に基づくVAE声質変換2022

Author(s)

Organizer

Related Report

[Presentation] リズムスタイルを考慮したFader Networksに基づく外国語学習者の発音変換2022

Author(s)

Organizer

Related Report

[Presentation] TTSモデルにおけるアラインメントロバスト性向上のための非停滞化制約付きForward Attention2022

Author(s)

Organizer

Related Report

[Presentation] 印象表現語ラベルを用いたFaderNetworksに基づく音声印象変換2022

Author(s)

Organizer

Related Report

[Presentation] 時系列条件付きボルツマンマシンによる位相復元2022

Author(s)

Organizer

Related Report

[Presentation] マルチモーダルVAEを用いた顔画像に基づく目標話者音声不要な声質変換2022

Author(s)

Organizer

Related Report

[Presentation] 深層エネルギーベースモデルによる音声の音響特徴量の生成2021

Author(s)

[Presentation] Speechsplit を用いたイントネーション・リズム・発音の矯正による外国語アクセント変換2023