Pronunciation and accent modeling for multi-dialect speech synthesis

Research Project

Project/Area Number	18K18100
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	The University of Tokyo
Principal Investigator	Takamichi Shinnosuke 東京大学, 大学院情報理工学系研究科, 助教 (90784330)
Project Period (FY)	2018-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2019: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2018: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords	音声合成 / 方言 / 韻律 / 深層学習 / 自然言語処理
Outline of Final Research Achievements	The purpose of this research is to artificially synthesize speech in any Japanese dialect. To achieve this goal, we have developed (1) a method that enables robust speech synthesis from noisy recorded speech, (2) a speech synthesis method that controls accents using geographic information of dialects, (3) a method for acquiring linguistic units for constructing accents without linguistic knowledge, (4) a method for acquiring dialectal accents without linguistic knowledge, (5) a method for realizing dialectal speech synthesis without linguistic knowledge, and (6) the release of a free speech database to realize dialectal speech synthesis.
Academic Significance and Societal Importance of the Research Achievements	本研究は，あらゆる日本語方言の音声を人工的に合成することを目的とする．消滅の危機にある日本語方言について，その特性を計算機的に保存することは，音声言語文化の保存からコンテンツ制作まで幅広い範囲に有用である．本研究はこれに向け，方言の知識なしに方言音声を合成可能な方法について多角的に取り組み，さらに，一般に利用可能な方言データベースを整備した．

Report

(5 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report
2018 Research-status Report

Research Products
(10 results)

All 2021 2020 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (9 results) (of which Int'l Joint Research: 4 results, Invited: 1 results)

[Journal Article] Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis2021
- Author(s)
  Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, and Hiroshi Saruwatari
- Journal Title
  
  Speech Communication
  
  Volume: 125
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Presentation] Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder2021
- Author(s)
  Kazuya Yufune, Tomoki Koriyama, Shinnosuke Takamichi and Hiroshi Saruwatari
- Organizer
  The 11th ISCA SSW
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] VQ-VAE に基づくアクセントの潜在変数表現を用いた方言音声合成2021
- Author(s)
  湯舟航耶，郡山知樹，高道慎之介，猿渡洋
- Organizer
  日本音響学会2021年秋季研究発表会講演論文集
- Related Report
  2021 Annual Research Report
[Presentation] アクセント潜在変数を用いた方言音声合成における文単位生成の評価2021
- Author(s)
  湯舟航耶，郡山知樹，高道慎之介，猿渡洋
- Organizer
  電子情報通信学会技術研究報告
- Related Report
  2021 Annual Research Report
[Presentation] 変分オートエンコーダを用いたアクセントの潜在変数表現の検討2020
- Author(s)
  湯舟航耶，郡山知樹，高道慎之介，猿渡洋
- Organizer
  日本音響学会2020年秋季研究発表会講演論文集
- Related Report
  2020 Research-status Report
[Presentation] 音響モデル尤度に基づく subword 分割の韻律推定精度における評価2020
- Author(s)
  阿曽真至，高道慎之介，高宗典玄，猿渡洋
- Organizer
  日本音響学会2020年春季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation2019
- Author(s)
  Masashi Aso, Shinnosuke Takamichi, Norihiro Takamune, Hiroshi Saruwatari
- Organizer
  ISCA SSW
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] End-to-end 韻律推定に向けたDNN音響モデルに基づくsubword分割2019
- Author(s)
  阿曽真至，高道慎之介，高宗典玄，猿渡洋
- Organizer
  日本音響学会2019年秋季研究発表会講演論文集
- Related Report
  2019 Research-status Report
[Presentation] Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis2018
- Author(s)
  Takanori Akiyama, Shinnosuke Takamichi, Hiroshi Saruwatari
- Organizer
  APSIPA ASC
- Related Report
  2018 Research-status Report
- Int'l Joint Research
[Presentation] Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech2018
- Author(s)
  Masakazu Une, Yuki Saito, Shinnosuke Takamichi, Daichi Kitamura, Ryoichi Miyazaki, Hiroshi Saruwatari
- Organizer
  APSIPA ASC
- Related Report
  2018 Research-status Report
- Int'l Joint Research / Invited

Pronunciation and accent modeling for multi-dialect speech synthesis

Principal Investigator

Takamichi Shinnosuke 東京大学, 大学院情報理工学系研究科, 助教 (90784330)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Journal Article] Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis2021

Author(s)

Journal Title

Related Report

[Presentation] Accent Modeling of Low-Resourced Dialect in Pitch Accent Language Using Variational Autoencoder2021

Author(s)

Organizer

Related Report

[Presentation] VQ-VAE に基づくアクセントの潜在変数表現を用いた方言音声合成2021

Author(s)

Organizer

Related Report

[Presentation] アクセント潜在変数を用いた方言音声合成における文単位生成の評価2021

Author(s)

Organizer

Related Report

[Presentation] 変分オートエンコーダを用いたアクセントの潜在変数表現の検討2020

Author(s)

Organizer

Related Report

[Presentation] 音響モデル尤度に基づく subword 分割の韻律推定精度における評価2020

Author(s)

Organizer

Related Report

[Presentation] Subword tokenization based on DNN-based acoustic model for end-to-end prosody generation2019

Author(s)

Organizer

Related Report

[Presentation] End-to-end 韻律推定に向けたDNN音響モデルに基づくsubword分割2019

Author(s)

Organizer

Related Report

[Presentation] Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis2018

Author(s)

Organizer

Related Report

[Presentation] Generative approach using the noise generation models for DNN-based speech synthesis trained from noisy speech2018

Author(s)

Organizer

Related Report