A study of speech synthesis for achieving synthetic speech with high quality and variability based on hybrid approach

Research Project

Project/Area Number	25730106
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	Tohoku University
Principal Investigator	NOSE Takashi 東北大学, 工学(系)研究科(研究院), 講師 (90550591)
Project Period (FY)	2013-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2014: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000) Fiscal Year 2013: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Keywords	統計的音声合成 / 非言語情報 / パラ言語情報 / 韻律 / 多言語 / 歌声合成 / パラメータ生成 / 隠れマルコフモデル / ガウス過程回帰 / 重回帰隠れセミマルコフモデル / 強調表現 / 音声合成 / ハイブリッド / 高品質 / 多様化
Outline of Final Research Achievements	The purpose of this research is to establish hybrid speech synthesis framework that can synthesize human-like speech with various emotional expressions and/or speaking styles using only a limited amount of speech data. We achieved the following six issues in this research. (1) Flexible control of non- or para-linguistic information appearing in synthetic speech. (2) Automatic training of prosodic variations, (3)Expansion to the multi-lingual or cross-lingual speech synthesis, (4)Application to singing voice synthesis, (5) Efficient designing of speech corpus for synthesis, and (6) Improving subjective quality of synthetic speech by modifying the conventional parameter generation method .

Report

(3 results)

2014 Annual Research Report Final Research Report ( PDF )
2013 Research-status Report

Research Products
(22 results)

All 2014 2013 Other

All Journal Article (14 results) (of which Peer Reviewed: 8 results, Acknowledgement Compliant: 7 results) Presentation (8 results)

[Journal Article] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis2014
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Speech Communication
  
  Volume: Vol.57 Pages: 144-154
- DOI
  10.1016/j.specom.2013.09.014
- Related Report
  2014 Annual Research Report 2013 Research-status Report
- Peer Reviewed
[Journal Article] A parameter generation algorithm using local variance for HMM-based speech synthesis2014
- Author(s)
  Takashi Nose, Vataya Chunwijitra, Takao Kobayashi
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing
  
  Volume: 8 Issue: 2 Pages: 221-228
- DOI
  10.1109/jstsp.2013.2283459
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Statistical Parametric Speech Synthesis Based on Gaussian Process Regression2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  IEEE Journal of Selected Topics in Signal Processing
  
  Volume: 8 Issue: 2 Pages: 173-183
- DOI
  10.1109/jstsp.2013.2283461
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] 系列内変動を考慮したガウス過程回帰に基づく音声パラメータ生成2014
- Author(s)
  郡山知樹,能勢隆,小林隆夫
- Journal Title
  
  日本音響学会2014年春季研究発表会講演論文集
  
  Volume: 1 Pages: 355-356
- NAID
  120006702995
- Related Report
  2014 Annual Research Report
- Acknowledgement Compliant
[Journal Article] ガウス過程回帰に基づく音声合成におけるハイパーパラメータ最適化の検討2014
- Author(s)
  郡山智樹,能勢隆,小林隆夫
- Journal Title
  
  電子情報通信学会技術研究報告
  
  Volume: 113 Pages: 19-24
- Related Report
  2014 Annual Research Report
- Acknowledgement Compliant
[Journal Article] Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing
  
  Volume: 1 Pages: 3862-3866
- NAID
  120006703288
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] ガウス過程回帰に基づくF0パタン生成の検討2014
- Author(s)
  郡山知樹,能勢隆,小林隆夫
- Journal Title
  
  日本音響学会2014年秋季研究発表会講演論文集
  
  Volume: 1 Pages: 247-248
- NAID
  120006703360
- Related Report
  2014 Annual Research Report
- Acknowledgement Compliant
[Journal Article] Parametric speech synthesis using local and global sparse Gaussian processes2014
- Author(s)
  Tomoki Koriyama, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proceedings of 24th IEEE International Workshop on Machine Learning for Signal Processing
  
  Volume: 1 Pages: 1-6
- NAID
  120006703336
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価2014
- Author(s)
  長濱大樹, 能勢隆, 郡山知樹, 小林隆夫
- Journal Title
  
  日本音響学会2014年春季研究発表会講演論文集
  
  Volume: vol.1 Pages: 413-414
- Related Report
  2013 Research-status Report
[Journal Article] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価2014
- Author(s)
  荒生侑介, 能勢隆, 郡山知樹, 篠崎隆宏, 小林隆夫
- Journal Title
  
  日本音響学会2014年春季研究発表会講演論文集
  
  Volume: vol.1 Pages: 405-406
- Related Report
  2013 Research-status Report
[Journal Article] Speaker-independent style conversion for HMM-based expressive speech synthesis2013
- Author(s)
  Hiroki Kanagawa, Takashi Nose, Takao Kobayashi
- Journal Title
  
  Proc. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
  
  Volume: vol.1 Pages: 7864-7867
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Journal Article] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013
- Author(s)
  Yu Maeno, Takashi Nose, Takao Kobayashi, Tomoki Koriyama, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
- Journal Title
  
  Proc. 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
  
  Volume: vol.1 Pages: 7859-7863
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Journal Article] A style control technique for singing voice synthesis based on multiple-regression HSMM2013
- Author(s)
  Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi
- Journal Title
  
  Proc. 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
  
  Volume: vol.1 Pages: 378-382
- Related Report
  2013 Research-status Report
- Peer Reviewed
[Journal Article] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討2013
- Author(s)
  荒生侑介, 能勢隆, 篠崎隆宏, 小林隆夫
- Journal Title
  
  日本音響学会2013年秋季研究発表会講演論文集
  
  Volume: vol.1 Pages: 351-352
- Related Report
  2013 Research-status Report
[Presentation] ガウス過程回帰に基づくF0パタン生成の検討2014
- Author(s)
  郡山知樹
- Organizer
  日本音響学会2014年秋季研究発表会
- Place of Presentation
  北海学園大学（北海道・札幌市）
- Year and Date
  2014-09-02
- Related Report
  2014 Annual Research Report
[Presentation] Singing style control for HMM-based expressive singing voice synthesis2014
- Author(s)
  Takashi Nose
- Organizer
  The 7th seminar of A3 foresight program
- Place of Presentation
  Seoul National University（Korea）
- Year and Date
  2014-05-26
- Related Report
  2014 Annual Research Report
[Presentation] Speaker-independent style conversion for HMM-based expressive speech synthesis
- Author(s)
  Hiroki Kanagawa
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
- Place of Presentation
  Vancouver, Canada
- Related Report
  2013 Research-status Report
[Presentation] HMM-based expressive speech synthesis based on phrase-level F0 context labeling
- Author(s)
  Yu Maeno
- Organizer
  2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
- Place of Presentation
  Vancouver, Canada
- Related Report
  2013 Research-status Report
[Presentation] A style control technique for singing voice synthesis based on multiple-regression HSMM
- Author(s)
  Takashi Nose
- Organizer
  14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
- Place of Presentation
  Lyon, France
- Related Report
  2013 Research-status Report
[Presentation] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討
- Author(s)
  荒生侑介
- Organizer
  日本音響学会2013年秋季研究発表会
- Place of Presentation
  豊橋技術科学大学
- Related Report
  2013 Research-status Report
[Presentation] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価
- Author(s)
  長濱大樹
- Organizer
  日本音響学会2014年春季研究発表会
- Place of Presentation
  日本大学
- Related Report
  2013 Research-status Report
[Presentation] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価
- Author(s)
  荒生侑介
- Organizer
  日本音響学会2014年春季研究発表会
- Place of Presentation
  日本大学
- Related Report
  2013 Research-status Report

A study of speech synthesis for achieving synthetic speech with high quality and variability based on hybrid approach

Principal Investigator

NOSE Takashi 東北大学, 工学(系)研究科(研究院), 講師 (90550591)

¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)

Report

Research Products

[Journal Article] Prosodic variation enhancement using unsupervised context labeling for HMM-based expressive speech synthesis2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A parameter generation algorithm using local variance for HMM-based speech synthesis2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Statistical Parametric Speech Synthesis Based on Gaussian Process Regression2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] 系列内変動を考慮したガウス過程回帰に基づく音声パラメータ生成2014

Author(s)

Journal Title

NAID

Related Report

[Journal Article] ガウス過程回帰に基づく音声合成におけるハイパーパラメータ最適化の検討2014

Author(s)

Journal Title

Related Report

[Journal Article] Parametric speech synthesis based on Gaussian process regression using global variance and hyperparameter optimization2014

Author(s)

Journal Title

NAID

Related Report

[Journal Article] ガウス過程回帰に基づくF0パタン生成の検討2014

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Parametric speech synthesis using local and global sparse Gaussian processes2014

Author(s)

Journal Title

NAID

Related Report

[Journal Article] 共有決定木を利用した話者適応に基づくクロスリンガル音声合成の評価2014

Author(s)

Journal Title

Related Report

[Journal Article] 音声合成のための音韻・韻律コンテキストを考慮した文選択アルゴリズムの評価2014

Author(s)

Journal Title

Related Report

[Journal Article] Speaker-independent style conversion for HMM-based expressive speech synthesis2013

Author(s)

Journal Title

Related Report

[Journal Article] HMM-based expressive speech synthesis based on phrase-level F0 context labeling2013

Author(s)

Journal Title

Related Report

[Journal Article] A style control technique for singing voice synthesis based on multiple-regression HSMM2013

Author(s)

Journal Title

Related Report

[Journal Article] 複数ドメインコーパスからの文選択に基づくキャラクター音声合成の検討2013

Author(s)

Journal Title

Related Report

[Presentation] ガウス過程回帰に基づくF0パタン生成の検討2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Singing style control for HMM-based expressive singing voice synthesis2014

Author(s)

Organizer

Place of Presentation

Year and Date