• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Speech Representation Using Emotion-Speaker Controllable Probabilistic Model Based on Extended Boltzmann Distribution

Research Project

Project/Area Number 18K18069
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 61010:Perceptual information processing-related
Research InstitutionThe University of Electro-Communications

Principal Investigator

Nakashika Toru  電気通信大学, 大学院情報理工学研究科, 准教授 (90749920)

Project Period (FY) 2018-04-01 – 2021-03-31
Project Status Completed (Fiscal Year 2020)
Budget Amount *help
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2020: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2019: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2018: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Keywords音声信号処理 / 機械学習 / 確率・統計 / 声質変換 / 感情音声変換 / 感情変換
Outline of Final Research Achievements

In speech signal processing, few methods have been established to simultaneously perform multiple different tasks such as speaker recognition and emotion recognition. In this research, we focused on the Boltzmann machine, which has the property of representing the relationships between various factors with its high potential ability, and examined the effectiveness of simultaneously realizing speaker recognition, emotion recognition, speaker conversion, and emotion conversion with it. From the experimental results, it was found that speaker recognition, emotion recognition, speaker conversion, and emotion conversion can be achieved using only a Boltzmann machine. We also revealed that the Boltzmann machine that simultaneously represents speakers and emotions outperformed the Boltzmann machine that represents either speakers or emotions in recognition and voice conversion accuracy.

Academic Significance and Societal Importance of the Research Achievements

本研究の実験結果は、エネルギー関数を適切に設計することで様々な特徴因子間の関係性を紐解くボルツマンマシンの有効性を示唆しており、意義のある研究成果であると考える。また副次的な研究成果として、複素数データを直接表現する変分オートエンコーダや、音声コミュニケーションにおける言語・生理・音響の連鎖を考慮したボルツマンマシンを用いた声質変換・音声認識のマルチタスク学習など、新たな手法の着想や知見を得ることもできた。

Report

(4 results)
  • 2020 Annual Research Report   Final Research Report ( PDF )
  • 2019 Research-status Report
  • 2018 Research-status Report
  • Research Products

    (33 results)

All 2021 2020 2019 2018

All Journal Article (3 results) (of which Peer Reviewed: 3 results,  Open Access: 3 results) Presentation (29 results) (of which Int'l Joint Research: 14 results) Patent(Industrial Property Rights) (1 results)

  • [Journal Article] Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion2020

    • Author(s)
      KISHIDA Takuya、NAKASHIKA Toru
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E103.D Issue: 11 Pages: 2340-2350

    • DOI

      10.1587/transinf.2020EDP7032

    • NAID

      130007933848

    • ISSN
      0916-8532, 1745-1361
    • Year and Date
      2020-11-01
    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech2019

    • Author(s)
      SONE Kentaro、NAKASHIKA Toru
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E102.D Issue: 8 Pages: 1546-1553

    • DOI

      10.1587/transinf.2018EDP7344

    • NAID

      130007686441

    • ISSN
      0916-8532, 1745-1361
    • Year and Date
      2019-08-01
    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra2019

    • Author(s)
      Nakashika Toru、Takaki Shinji、Yamagishi Junichi
    • Journal Title

      IEEE/ACM Transactions on Audio, Speech, and Language Processing

      Volume: 27 Issue: 2 Pages: 244-254

    • DOI

      10.1109/taslp.2018.2877465

    • Related Report
      2018 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] Attention RBMによる音声特徴量系列の符号化と生成2021

    • Author(s)
      岸田 拓也,中鹿 亘
    • Organizer
      日本音響学会2021年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] 条件付きボルツマンマシンによる位相復元の初期検討2021

    • Author(s)
      羽賀 洋克,矢田部 浩平,岸田 拓也,中鹿 亘
    • Organizer
      日本音響学会2021年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] VQVAEに基づくリアルタイム波形ベース声質変換の検討2021

    • Author(s)
      大西 弘太郎,中鹿 亘,松本 光春
    • Organizer
      日本音響学会2021年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM2020

    • Author(s)
      Takuya Kishida, Shin Tsukamoto, Toru Nakashika
    • Organizer
      Interspeech 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Complex-Valued Variational Autoencoder: A Novel Deep Generative Model for Direct Representation of Complex Spectra2020

    • Author(s)
      Toru Nakashika
    • Organizer
      Interspeech 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Gamma Boltzmann Machine for Simultaneously Modeling Linear- and Log-amplitude Spectra2020

    • Author(s)
      Toru Nakashika and Kohei Yatabe
    • Organizer
      APSIPA Annual Summit and Conference 2020
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Cluster ARBM を用いた話者・音韻相互作用分類による声質変換2020

    • Author(s)
      岸田 拓也,中鹿 亘
    • Organizer
      日本音響学会2020年秋季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] 適応型 RBM を用いた音声情報の分離による話者と感情の同時変換2020

    • Author(s)
      塚本 伸,岸田 拓也,中鹿 亘
    • Organizer
      日本音響学会2020年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] HMelGAN: 階層的構造を導入した敵対的学習ネットワークに基づく高速ニューラルボコーダ2020

    • Author(s)
      大西 弘太郎,中鹿 亘,松本 光春
    • Organizer
      日本音響学会2020年秋季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] マルチタスクモデルを用いたdisentangleな学習による楽器音変換2020

    • Author(s)
      荒川 賢也,岸田 拓也,中鹿 亘
    • Organizer
      日本音響学会2020年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] Speech chain を模倣したボルツマンマシンによるワンショット多対多声質変換の検討2020

    • Author(s)
      岸田 拓也,中鹿 亘
    • Organizer
      日本音響学会2020年春季研究発表会
    • Related Report
      2020 Annual Research Report
  • [Presentation] 適応型RBMを用いた音声情報の分離による話者と感情の同時変換2020

    • Author(s)
      塚本伸,岸田拓也,中鹿亘
    • Organizer
      日本音響学会2020年春季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] マルチタスクモデルを用いたdisentangleな学習による楽器音変換2020

    • Author(s)
      荒川賢也, 岸田拓也, 中鹿亘
    • Organizer
      日本音響学会2020年春季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] Speech chainを模倣したボルツマンマシンによるワンショット多対多声質変換の検討2020

    • Author(s)
      岸田拓也,中鹿亘
    • Organizer
      日本音響学会2020年春季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] STFT spectral loss for training a neural speech waveform model2019

    • Author(s)
      Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
    • Organizer
      ICASSP2019
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] 適応型RBMを用いたノンパラレル感情音声変換2019

    • Author(s)
      塚本伸,岸田拓也,中鹿亘
    • Organizer
      日本音響学会2019年秋季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] Fader Networksを用いた楽器音変換2019

    • Author(s)
      荒川賢也, 岸田拓也, 中鹿亘
    • Organizer
      日本音響学会2019年秋季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] 複素VAE: 音声の複素スペクトルを直接表現する新しい変分自己符号化器2019

    • Author(s)
      中鹿亘
    • Organizer
      日本音響学会2019年秋季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] Speech chain VC: 音声コミュニケーションの言語-生理-音響連鎖を考慮する声質変換2019

    • Author(s)
      岸田拓也,中鹿亘
    • Organizer
      日本音響学会2019年秋季研究発表会
    • Related Report
      2019 Research-status Report
  • [Presentation] VAEを用いた多対多声質変換における音素識別制約の検討2019

    • Author(s)
      木庭慶人, 中鹿亘
    • Organizer
      日本音響学会2019年春季研究発表会
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] セミパラレル手法による適応型 RBM を用いた声質変換の性能改善2019

    • Author(s)
      塚本伸, 中鹿亘
    • Organizer
      日本音響学会2019年春季研究発表会
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] パラレル制約付きVAEを用いた未知話者声質変換の検討2019

    • Author(s)
      大西弘太郎, 中鹿亘
    • Organizer
      日本音響学会2019年春季研究発表会
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] スペクトル系列誤差に基づくDNN音声波形モデルの学習2019

    • Author(s)
      高木信二, 中鹿亘, 山岸順一
    • Organizer
      日本音響学会2019年春季研究発表会
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] 音声スペクトル系列の自己回帰性を考慮した複素RBMの拡張2019

    • Author(s)
      中鹿亘, 高木信二, 山岸順一
    • Organizer
      日本音響学会2019年春季研究発表会
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] DNN-based Speech Synthesis for Small Data Sets Considering Bidirectional Speech-Text Conversion2018

    • Author(s)
      Kentaro Sone, and Toru Nakashika
    • Organizer
      Interspeech 2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory2018

    • Author(s)
      Toru Nakashika
    • Organizer
      Interspeech2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Bidirectional Voice Conversion Based on Joint Training Using Gaussian-Gaussian Deep Relational Model2018

    • Author(s)
      Kentaro Sone, Shinji Takaki, and Toru Nakashika
    • Organizer
      Odyssey 2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Parallel-Data-Free Dictionary Learning for Voice Conversion Using Non-Negative Tucker Decomposition2018

    • Author(s)
      Yuki Takashima, Hajime Yano, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
    • Organizer
      ICASSP2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] DRMを用いた唇動画像と音声の双方向変換2018

    • Author(s)
      塚本伸, 中鹿亘
    • Organizer
      音学シンポジウム2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Patent(Industrial Property Rights)] 符号化装置、復号装置、パラメータ学習装置、およびプログラム2019

    • Inventor(s)
      中鹿亘
    • Industrial Property Rights Holder
      中鹿亘
    • Industrial Property Rights Type
      特許
    • Industrial Property Number
      2019-150516
    • Filing Date
      2019
    • Related Report
      2019 Research-status Report

URL: 

Published: 2018-04-23   Modified: 2022-01-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi