Establishment of speech synthesis framework based on Gaussian process regression

Research Project

Project/Area Number	15H02724
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perceptual information processing
Research Institution	Tokyo Institute of Technology
Principal Investigator	Kobayashi Takao 東京工業大学, 工学院, 教授 (70153616)
Co-Investigator(Kenkyū-buntansha)	郡山知樹東京工業大学, 工学院, 助教 (50749124)
Research Collaborator	MOUNGSRI Decha NAGAHAMA Daiki NOSE Takashi ARIFIANTO Dhany
Project Period (FY)	2015-04-01 – 2018-03-31
Project Status	Completed (Fiscal Year 2017)
Budget Amount *help	¥13,000,000 (Direct Cost: ¥10,000,000、Indirect Cost: ¥3,000,000) Fiscal Year 2017: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2016: ¥4,030,000 (Direct Cost: ¥3,100,000、Indirect Cost: ¥930,000) Fiscal Year 2015: ¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)
Keywords	テキスト音声合成 / 統計的パラメトリック音声合成 / 韻律生成 / ガウス過程回帰 / GPR音声合成 / HMM音声合成 / 機械学習 / 深層学習 / 音声情報処理 / 深層ガウス過程
Outline of Final Research Achievements	The purpose of the research is to develop a novel statistical parametric speech synthesis framework based on Gaussian process regression (GPR). We have proposed prosody generation techniques including pitch pattern prediction and phone duration prediction as well as the spectral parameter generation technique based on GPR. We developed a GPR-based speech synthesis system and showed its effectiveness through assessment of synthetic speech quality. Furthermore, we examined the proposed framework for generating expressive speech. We also examined it for generating more natural-sounding prosody in speech synthesis of a tonal language.

Report

(4 results)

2017 Annual Research Report Final Research Report ( PDF )
2016 Annual Research Report
2015 Annual Research Report

Research Products
(49 results)

All 2018 2017 2016 2015

All Journal Article (25 results) (of which Peer Reviewed: 9 results, Open Access: 5 results, Acknowledgement Compliant: 17 results) Presentation (24 results) (of which Int'l Joint Research: 7 results, Invited: 1 results)

[Journal Article] GPR-based Thai speech synthesis using multi-level duration prediction2018
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Speech Communication
  
  Volume: 99 Pages: 114-123
- DOI
  10.1016/j.specom.2018.03.005
- Related Report
  2017 Annual Research Report
- Peer Reviewed
[Journal Article] GP-DNNハイブリッドモデルに基づく統計的音声合成の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Journal Title
  
  電子情報通信学会技術研究報告(SP)
  
  Volume: 117(393) Pages: 5-10
- NAID
  40021473756
- Related Report
  2017 Annual Research Report
[Journal Article] GPR音声合成における深層ガウス過程の利用の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Journal Title
  
  電子情報通信学会技術研究報告(SP)
  
  Volume: 117(517) Pages: 27-32
- NAID
  120006705503
- Related Report
  2017 Annual Research Report
[Journal Article] GPR音声合成における区分線形変換を用いたスタイル適応のためのデータ分割法の検討2018
- Author(s)
  前野雄也, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2018年春季研究発表会講演論文集
  
  Volume: - Pages: 295-296
- Related Report
  2017 Annual Research Report
[Journal Article] GPR音声合成における深層構造の利用の検討2018
- Author(s)
  郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2018年春季研究発表会講演論文集
  
  Volume: - Pages: 1507-1508
- NAID
  120006705491
- Related Report
  2017 Annual Research Report
[Journal Article] Speaker Adaptation Using Shared Context Clustering for Cross-lingual Speech Synthesis2017
- Author(s)
  長濱大樹, 能勢隆, 郡山知樹, 小林隆夫
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J100-D Issue: 3 Pages: 385-393
- DOI
  10.14923/transinfj.2016PDP0020
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2017-03-01
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features2017
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proceedings of APSIPA Annual Summit and Conference 2017
  
  Volume: - Pages: 1-4
- Related Report
  2017 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討2017
- Author(s)
  郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年秋季研究発表会講演論文集
  
  Volume: - Pages: 177-178
- NAID
  120006705316
- Related Report
  2017 Annual Research Report
[Journal Article] ガウス過程回帰に基づく歌声合成の検討2017
- Author(s)
  郡山知樹, 岡野祐紀, 小林隆夫
- Journal Title
  
  日本音響学会2017年秋季研究発表会講演論文集
  
  Volume: - Pages: 295-296
- NAID
  120006705394
- Related Report
  2017 Annual Research Report
[Journal Article] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017)
  
  Volume: － Pages: 5945-5948
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017
- Author(s)
  増子理菜, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 283-284
- Related Report
  2016 Annual Research Report
- Acknowledgement Compliant
[Journal Article] GPR音声合成に基づいたオーディオブック音声の合成2017
- Author(s)
  津野駿幸, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 295-296
- Related Report
  2016 Annual Research Report
- Acknowledgement Compliant
[Journal Article] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017
- Author(s)
  高橋亮, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2017年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 367-368
- Related Report
  2016 Annual Research Report
- Acknowledgement Compliant
[Journal Article] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. the 8th International Conference on Speech Prosody (SPEECH PROSODY 2016)
  
  Volume: － Pages: 1014-1018
- DOI
  10.21437/speechprosody.2016-208
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016)
  
  Volume: － Pages: 1591-1595
- DOI
  10.21437/interspeech.2016-273
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016
- Author(s)
  前野雄也, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2016年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 213-214
- Related Report
  2016 Annual Research Report
- Acknowledgement Compliant
[Journal Article] 非パラレルデータを用いるGMM声質変換の検討2016
- Author(s)
  高橋亮, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2016年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 267-268
- Related Report
  2016 Annual Research Report
- Acknowledgement Compliant
[Journal Article] A speaker adaptation technique for Gaussian process regression based speech synthesis using feature space transform2016
- Author(s)
  Tomoki Koriyama, Syohei Oshio, Takao Kobayashi
- Journal Title
  
  Proc. 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
  
  Volume: ICASSP Pages: 5610-5614
- NAID
  120006704514
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] 音声合成のためのCRF/HMMに基づく自動アクセント推定の評価2016
- Author(s)
  増子理菜, 郡山知樹, 小林隆夫
- Journal Title
  
  電子情報通信学会技術研究報告〔音声〕
  
  Volume: 115/SP2015-85 Pages: 1-6
- Related Report
  2015 Annual Research Report
- Acknowledgement Compliant
[Journal Article] GPR音声合成におけるスタイル適応の検討2016
- Author(s)
  前野雄也, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2016年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 233-234
- Related Report
  2015 Annual Research Report
- Acknowledgement Compliant
[Journal Article] 多様なスタイルによるGPR音声合成の検討2016
- Author(s)
  岡元伶洋, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2016年春季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 361-362
- Related Report
  2015 Annual Research Report
- Acknowledgement Compliant
[Journal Article] Duration prediction using multi-level model for GPR-based speech synthesis2015
- Author(s)
  Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 16th Annual Conference of the International Speech Communication Association (INTERSPEECH)
  
  Volume: INTERSPEECH Pages: 1591-1595
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data2015
- Author(s)
  Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 16th Annual Conference of the International Speech Communication Association (INTERSPEECH)
  
  Volume: INTERSPEECH Pages: 3496-3500
- Related Report
  2015 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] GPR音声合成における話者適応手法の検討2015
- Author(s)
  押尾翔平, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2015年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 219-220
- Related Report
  2015 Annual Research Report
- Acknowledgement Compliant
[Journal Article] ガウス過程回帰に基づく音声合成システムの評価2015
- Author(s)
  郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2015年秋季研究発表会講演論文集
  
  Volume: CD-ROM Pages: 235-236
- NAID
  120006704045
- Related Report
  2015 Annual Research Report
- Acknowledgement Compliant
[Presentation] GP-DNNハイブリッドモデルに基づく統計的音声合成の検討2018
- Author(s)
  郡山知樹
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2017 Annual Research Report
[Presentation] GPR音声合成における深層ガウス過程の利用の検討2018
- Author(s)
  郡山知樹
- Organizer
  電子情報通信学会音声研究会
- Related Report
  2017 Annual Research Report
[Presentation] GPR音声合成における区分線形変換を用いたスタイル適応のためのデータ分割法の検討2018
- Author(s)
  前野雄也
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] GPR音声合成における深層構造の利用の検討2018
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2018年春季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017
- Author(s)
  増子理菜
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県川崎市）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] GPR音声合成に基づいたオーディオブック音声の合成2017
- Author(s)
  津野駿幸
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県川崎市）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017
- Author(s)
  高橋亮
- Organizer
  日本音響学会2017年春季研究発表会
- Place of Presentation
  明治大学生田キャンパス（神奈川県川崎市）
- Year and Date
  2017-03-15
- Related Report
  2016 Annual Research Report
[Presentation] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017
- Author(s)
  Decha Moungsri
- Organizer
  2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
- Place of Presentation
  ヒルトンニューオーリンズリバーサイド（米国）
- Year and Date
  2017-03-05
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features2017
- Author(s)
  Decha Moungsri
- Organizer
  APSIPA Annual Summit and Conference 2017
- Related Report
  2017 Annual Research Report
- Int'l Joint Research
[Presentation] 表現豊かな音声合成に向けた多様な話者性とスタイルによる音声合成への取組み2017
- Author(s)
  小林隆夫
- Organizer
  第19回音声言語シンポジウム
- Related Report
  2017 Annual Research Report
- Invited
[Presentation] GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討2017
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] ガウス過程回帰に基づく歌声合成の検討2017
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2017年秋季研究発表会
- Related Report
  2017 Annual Research Report
[Presentation] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016
- Author(s)
  前野雄也
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  富山大学（富山県富山市）
- Year and Date
  2016-09-14
- Related Report
  2016 Annual Research Report
[Presentation] 非パラレルデータを用いるGMM声質変換の検討2016
- Author(s)
  高橋亮
- Organizer
  日本音響学会2016年秋季研究発表会
- Place of Presentation
  富山大学（富山県富山市）
- Year and Date
  2016-09-14
- Related Report
  2016 Annual Research Report
[Presentation] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri
- Organizer
  17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
- Place of Presentation
  ハイアットリージェンシーサンフランシスコ（米国）
- Year and Date
  2016-09-08
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016
- Author(s)
  Decha Moungsri
- Organizer
  the 8th International Conference on Speech Prosody, SPEECH PROSODY 2016
- Place of Presentation
  ボストン大学（米国）
- Year and Date
  2016-05-31
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] A speaker adaptation technique for Gaussian process regression based speech synthesis using feature space transform2016
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  2016 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2016
- Place of Presentation
  上海国際会議中心（中国）
- Year and Date
  2016-03-20
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] 多様なスタイルによるGPR音声合成の検討2016
- Author(s)
  岡元伶洋, 郡山知樹, 小林隆夫
- Organizer
  日本音響学会2016年春季研究発表会
- Place of Presentation
  桐蔭横浜大学（神奈川県横浜市）
- Year and Date
  2016-03-09
- Related Report
  2015 Annual Research Report
[Presentation] GPR音声合成におけるスタイル適応の検討2016
- Author(s)
  前野雄也, 郡山知樹, 小林隆夫
- Organizer
  日本音響学会2016年春季研究発表会
- Place of Presentation
  桐蔭横浜大学（神奈川県横浜市）
- Year and Date
  2016-03-09
- Related Report
  2015 Annual Research Report
[Presentation] 音声合成のためのCRF/HMMに基づく自動アクセント推定の評価2016
- Author(s)
  増子理菜, 郡山知樹, 小林隆夫
- Organizer
  電子情報通信学会・日本音響学会音声研究会
- Place of Presentation
  サンピアンかわさき（神奈川県川崎市）
- Year and Date
  2016-01-14
- Related Report
  2015 Annual Research Report
[Presentation] GPR音声合成における話者適応手法の検討2015
- Author(s)
  押尾翔平, 郡山知樹, 小林隆夫
- Organizer
  日本音響学会2015年秋季研究発表会
- Place of Presentation
  会津大学（福島県会津若松市）
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] ガウス過程回帰に基づく音声合成システムの評価2015
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  日本音響学会2015年秋季研究発表会
- Place of Presentation
  会津大学（福島県会津若松市）
- Year and Date
  2015-09-16
- Related Report
  2015 Annual Research Report
[Presentation] Duration prediction using multi-level model for GPR-based speech synthesis2015
- Author(s)
  Decha Moungsri, 郡山知樹, 小林隆夫
- Organizer
  16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
- Place of Presentation
  ドレスデンインターナショナルコングレスセンター（ドイツ）
- Year and Date
  2015-09-06
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data2015
- Author(s)
  郡山知樹, 小林隆夫
- Organizer
  16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015
- Place of Presentation
  ドレスデンインターナショナルコングレスセンター（ドイツ）
- Year and Date
  2015-09-06
- Related Report
  2015 Annual Research Report
- Int'l Joint Research

Establishment of speech synthesis framework based on Gaussian process regression

Principal Investigator

Kobayashi Takao 東京工業大学, 工学院, 教授 (70153616)

¥13,000,000 (Direct Cost: ¥10,000,000、Indirect Cost: ¥3,000,000)

Report

Research Products

[Journal Article] GPR-based Thai speech synthesis using multi-level duration prediction2018

Author(s)

Journal Title

DOI

Related Report

[Journal Article] GP-DNNハイブリッドモデルに基づく統計的音声合成の検討2018

Author(s)

Journal Title

NAID

Related Report

[Journal Article] GPR音声合成における深層ガウス過程の利用の検討2018

Author(s)

Journal Title

NAID

Related Report

[Journal Article] GPR音声合成における区分線形変換を用いたスタイル適応のためのデータ分割法の検討2018

Author(s)

Journal Title

Related Report

[Journal Article] GPR音声合成における深層構造の利用の検討2018

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Speaker Adaptation Using Shared Context Clustering for Cross-lingual Speech Synthesis2017

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features2017

Author(s)

Journal Title

Related Report

[Journal Article] GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討2017

Author(s)

Journal Title

NAID

Related Report

[Journal Article] ガウス過程回帰に基づく歌声合成の検討2017

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Duration prediction using multiple Gaussian process experts for GPR-based speech synthesis2017

Author(s)

Journal Title

Related Report

[Journal Article] アクセント情報自動ラベリングの音声合成品質への影響に関する検討2017

Author(s)

Journal Title

Related Report

[Journal Article] GPR音声合成に基づいたオーディオブック音声の合成2017

Author(s)

Journal Title

Related Report

[Journal Article] コンテキストを考慮した音素マッチングに基づく非パラレルデータGMM声質変換2017

Author(s)

Journal Title

Related Report

[Journal Article] Tone modeling using Gaussian process latent variable model for statistical speech synthesis2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Unsupervised stress information labeling using Gaussian process latent variable model for statistical speech synthesis2016

Author(s)

Journal Title

DOI

Related Report

[Journal Article] GPR音声合成における区分線形特徴量変換を用いたスタイル適応の検討2016

Author(s)

Journal Title