Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours

Research Project

Project/Area Number	24300068
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Partial Multi-year Fund
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	The University of Tokyo
Principal Investigator	HIROSE Keikichi 東京大学, 情報理工学(系)研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	MINEMATSU Nobuaki 東京大学, 大学院工学系研究科, 教授 (90273333) SAITO Daisuke 東京大学, 大学院工学系研究科, 助教 (40615150)
Project Period (FY)	2012-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥17,810,000 (Direct Cost: ¥13,700,000、Indirect Cost: ¥4,110,000) Fiscal Year 2014: ¥5,460,000 (Direct Cost: ¥4,200,000、Indirect Cost: ¥1,260,000) Fiscal Year 2013: ¥5,590,000 (Direct Cost: ¥4,300,000、Indirect Cost: ¥1,290,000) Fiscal Year 2012: ¥6,760,000 (Direct Cost: ¥5,200,000、Indirect Cost: ¥1,560,000)
Keywords	基本周波数パターン / 生成過程モデル / 統計的音声合成 / 韻律制御 / 音声変換 / 談話の焦点 / マルチストリーム学習 / 行列変量GMM / HMM音声合成 / Deep Neural Network / マルチストリーム / 統計モデリング / 声質変換 / 焦点制御 / 中国語音声 / 声調核モデル
Outline of Final Research Achievements	Research works were conducted with the aim of realizing flexible control of prosody and better speech quality in statistical-based speech synthesis by applying constraints of the generation process model of fundamental frequency (F0) contours. Several methods were developed including one to use F0 contours approximated by the model for HMM training. In the method, hierarchical F0 contours based on the model were treated separately by the multi-stream scheme, leading to a better prosody control keeping clear relations with linguistic information. Lexical emphasis was realized by manipulating the model commands (prosody conversion). Better speaker conversions were realized in multi-speaker case through matrix-variate Gaussian mixture model and deep neural network with speaker-dependent sub-networks. Research works were conducted also for Chinese, with preliminary experiments on speech translation.

Report

(4 results)

2014 Annual Research Report Final Research Report ( PDF )
2013 Annual Research Report
2012 Annual Research Report

Research Products
(37 results)

All 2015 2014 2013 2012

All Journal Article (18 results) (of which Peer Reviewed: 16 results, Acknowledgement Compliant: 3 results) Presentation (18 results) (of which Invited: 5 results) Book (1 results)

[Journal Article] Automatic Estimation of Parameters of the Generation Process Model and Its Use for HMM-Based Speech Synthesis2015
- Author(s)
  橋本浩弥, 齋藤大輔, 峯松信明, 広瀬啓吉
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J98-D Issue: 3 Pages: 481-491
- DOI
  10.14923/transinfj.2014PDP0030
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2015-03-01
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Control of Prosodic Focus Based on Command Differences in Generation Process Model of Fundamental Frequency Contours2015
- Author(s)
  越智景子, 広瀬啓吉, 峯松信明
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J98-D Issue: 3 Pages: 524-533
- DOI
  10.14923/transinfj.2014JDP7084
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2015-03-01
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, Xiaoying Xu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 1032-1036
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014
- Author(s)
  Tomoyuki Mizukami, Hiroya Hashimoto, Keikichi Hirose, Daisuke Saito, and Nobuaki Minematsu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 1042-1046
- Related Report
  2014 Annual Research Report 2013 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Robust pitch estimation using ensemble empirical mode decomposition2014
- Author(s)
  Sujan Kumar Roy, Md. Khademul Islam Molla, Keikichi Hirose
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: １ Pages: 534-538
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Application of matrix variate Gaussian mixture model to statistical voice conversion2014
- Author(s)
  Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose
- Journal Title
  
  Proceedings INTERSPEECH 2014
  
  Volume: １ Pages: 2504-2508
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceeedings of Forum Acusticum
  
  Volume: １ Pages: 1-6
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Tensor representation for speaker characteristics in speech2014
- Author(s)
  Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
- Journal Title
  
  Proceeedings of Forum Acusticum
  
  Volume: １ Pages: 1-5
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose, Hiroya Hashimoto, Kyota Hyakutake, Daisuke Saito, Nobuaki Minematsu
- Journal Title
  
  Proceedings IEEE International Conference on Signal Processing
  
  Volume: １ Pages: 555-560
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Voice conversion based on matrix variate gaussian mixture model2014
- Author(s)
  Daisuke Saito, H. Doi, Nobuaki Minematsu, Keikichi Hirose
- Journal Title
  
  Proceedings IEEE International Conference on Signal Processing
  
  Volume: １ Pages: 567-576
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, and Xiaoying Xu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody
  
  Volume: 1 Pages: 1032-1036
- Related Report
  2013 Annual Research Report
- Peer Reviewed
[Journal Article] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceedings of International Symposium on Frontiers of Research on Speech and Music
  
  Volume: 1 Pages: 96-100
- Related Report
  2013 Annual Research Report
[Journal Article] Generation of fundamental frequency contours for Thai speech using the tone nucleus model2013
- Author(s)
  Oraphan Krityakien, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Journal of Signal Processing, Research Institute of Signal Processing
  
  Volume: 16 Pages: 135-138
- NAID
  130004849292
- Related Report
  2013 Annual Research Report
- Peer Reviewed
[Journal Article] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013
- Author(s)
  Hiroya Hashimoto, Keikichi Hirose and Nobuaki Minematsu
- Journal Title
  
  Proceedings 8th ISCA Workshop on Speech Synthesis
  
  Volume: 1 Pages: 35-39
- Related Report
  2013 Annual Research Report
- Peer Reviewed
[Journal Article] Toward flexible and systematic control of fundamental frequencies in HMM-based speech synthesis2013
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Journal of English Phonetics Society of Japan
  
  Volume: 18 Pages: 121-128
- Related Report
  2013 Annual Research Report
[Journal Article] Applying generation process model constraint to fundamental frequency contours generated by hidden- Markov-model-based speech synthesis2012
- Author(s)
  Tatsuya Matsuda, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Acoustical Science and Technology, Acoustical Society of Japan
  
  Volume: 33 Pages: 221-228
- NAID
  130001853341
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] A method for generation of Mandarin FO contours based on tone nucleus model and superpositional model2012
- Author(s)
  Qinghua Sun, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Speech Communication
  
  Volume: 54 Pages: 932-945
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based2012
- Author(s)
  Hiroya Hashimoto, Keikichi Hirose, and Nobuaki Minematsu
- Journal Title
  
  Proceedings INTERSPEECH
  
  Volume: CD Pages: 1-4
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Presentation] 生成過程モデルによる基本周波数パターンの階層表現とHMM音声合成のマルチストリーム学習2015
- Author(s)
  島田智大
- Organizer
  日本音響学会春季講演会
- Place of Presentation
  中央大学, 文京区, 東京
- Year and Date
  2015-03-16 – 2015-03-18
- Related Report
  2014 Annual Research Report
[Presentation] 複数出力サブネットワークを有するディープニューラルネットワークに基づく声質変換2014
- Author(s)
  橋本哲弥
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  東京工業大学（すずかけ台）, 横浜市
- Year and Date
  2014-12-15 – 2014-12-16
- Related Report
  2014 Annual Research Report
[Presentation] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose
- Organizer
  IEEE International Conference on Signal Processing
- Place of Presentation
  Hangzhou, China
- Year and Date
  2014-10-19 – 2014-10-23
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] Voice conversion based on matrix variate gaussian mixture model2014
- Author(s)
  Daisuke Saito
- Organizer
  IEEE International Conference on Signal Processing
- Place of Presentation
  Hangzhou, China
- Year and Date
  2014-10-19 – 2014-10-23
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] Application of matrix variate Gaussian mixture model to statistical voice conversion2014
- Author(s)
  Daisuke Saito
- Organizer
  INTERSPEECH 2014
- Place of Presentation
  Changi, Singapore
- Year and Date
  2014-09-14 – 2014-09-18
- Related Report
  2014 Annual Research Report
[Presentation] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014
- Author(s)
  Keikichi Hirose
- Organizer
  Forum Acusticum 2014
- Place of Presentation
  Krakow, Poland
- Year and Date
  2014-09-07 – 2014-09-12
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] Tensor representation for speaker characteristics in speech2014
- Author(s)
  Daisuke Saito
- Organizer
  Forum Acusticum 2014
- Place of Presentation
  Krakow, Poland
- Year and Date
  2014-09-07 – 2014-09-12
- Related Report
  2014 Annual Research Report
- Invited
[Presentation] 話者依存サブネットワークを用いた深層学習による多対一声質変換2014
- Author(s)
  橋本哲哉
- Organizer
  日本音響学会秋季講演会
- Place of Presentation
  北海学園大学, 札幌市
- Year and Date
  2014-09-03 – 2014-09-05
- Related Report
  2014 Annual Research Report
[Presentation] 行列変量正規分布の混合モデルとその声質変換への応用2014
- Author(s)
  齋藤大輔
- Organizer
  情報処理学会音声言語情報処理研究会
- Place of Presentation
  ホテル花巻, 花巻市
- Year and Date
  2014-07-24 – 2014-07-26
- Related Report
  2014 Annual Research Report
[Presentation] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014
- Author(s)
  Ya Li
- Organizer
  International Conference on Speech Prosody
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
- Related Report
  2014 Annual Research Report 2013 Annual Research Report
[Presentation] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014
- Author(s)
  Tomoyuki Mizukami
- Organizer
  International Conference on Speech Prosody
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
- Related Report
  2014 Annual Research Report 2013 Annual Research Report
[Presentation] Robust pitch estimation using ensemble empirical mode decomposition2014
- Author(s)
  Sujan Kumar Roy
- Organizer
  International Conference on Speech Prosod
- Place of Presentation
  Dublin, Ireland
- Year and Date
  2014-05-20 – 2014-05-23
- Related Report
  2014 Annual Research Report
[Presentation] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014
- Author(s)
  Keikichi Hirose
- Organizer
  International Symposium on Frontiers of Research on Speech and Music
- Place of Presentation
  Mysore, India
- Related Report
  2013 Annual Research Report
- Invited
[Presentation] 生成過程モデルにおけるF0 パターン差分を考慮したHMM音声合成の実験的検討2014
- Author(s)
  百武恭汰
- Organizer
  日本音響学会全国大会
- Place of Presentation
  日本大学, 東京
- Related Report
  2013 Annual Research Report
[Presentation] 行列変量ガウス混合分布に基づく声質変換の検討2014
- Author(s)
  土井秀信
- Organizer
  日本音響学会全国大会
- Place of Presentation
  日本大学, 東京
- Related Report
  2013 Annual Research Report
[Presentation] Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model2013
- Author(s)
  Keikichi Hirose
- Organizer
  INTERSPEECH 2013
- Place of Presentation
  Lyon, France
- Related Report
  2013 Annual Research Report
[Presentation] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013
- Author(s)
  Hiroya Hashimoto
- Organizer
  8th ISCA Workshop on Speech Synthesis
- Place of Presentation
  Barcelona, Spein
- Related Report
  2013 Annual Research Report
[Presentation] Use of generation process model for synthesizing fundamental frequency contours in HMM-based speech synthesis2012
- Author(s)
  Keikichi Hirose
- Organizer
  IEEE International Conference on Signal Processing
- Place of Presentation
  北京, 中国(招待講演)
- Year and Date
  2012-10-22
- Related Report
  2012 Annual Research Report
[Book] Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis2015
- Author(s)
  Ｋｅｉｋｉｃｈｉ　Ｈｉｒｏｓｅ，　Ｊｉｎｈｕａ　Ｔａｏ　（ｅｄｉｔｏｒｓ）
- Total Pages
  213
- Publisher
  Ｓｐｒｉｎｇｅｒ－Ｖｅｒｌａｇ
- Related Report
  2014 Annual Research Report

Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours

Principal Investigator

HIROSE Keikichi 東京大学, 情報理工学(系)研究科, 教授 (50111472)

¥17,810,000 (Direct Cost: ¥13,700,000、Indirect Cost: ¥4,110,000)

Report

Research Products

[Journal Article] Automatic Estimation of Parameters of the Generation Process Model and Its Use for HMM-Based Speech Synthesis2015

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Control of Prosodic Focus Based on Command Differences in Generation Process Model of Fundamental Frequency Contours2015

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

Author(s)

Journal Title

Related Report

[Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014

Author(s)

Journal Title

Related Report

[Journal Article] Robust pitch estimation using ensemble empirical mode decomposition2014

Author(s)

Journal Title

Related Report

[Journal Article] Application of matrix variate Gaussian mixture model to statistical voice conversion2014

Author(s)

Journal Title

Related Report

[Journal Article] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014

Author(s)

Journal Title

Related Report

[Journal Article] Tensor representation for speaker characteristics in speech2014

Author(s)

Journal Title

Related Report

[Journal Article] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014

Author(s)

Journal Title

Related Report

[Journal Article] Voice conversion based on matrix variate gaussian mixture model2014

Author(s)

Journal Title

Related Report

[Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

Author(s)

Journal Title

Related Report

[Journal Article] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014

Author(s)

Journal Title

Related Report

[Journal Article] Generation of fundamental frequency contours for Thai speech using the tone nucleus model2013

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013

Author(s)

Journal Title

Related Report

[Journal Article] Toward flexible and systematic control of fundamental frequencies in HMM-based speech synthesis2013

Author(s)

Journal Title

Related Report

[Journal Article] Applying generation process model constraint to fundamental frequency contours generated by hidden- Markov-model-based speech synthesis2012

Author(s)

Journal Title

NAID

Related Report

[Journal Article] A method for generation of Mandarin FO contours based on tone nucleus model and superpositional model2012

Author(s)