• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Advanced method of prosody control in statistical-based speech synthesis using generation process model of fundamental frequency contours

Research Project

Project/Area Number 24300068
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypePartial Multi-year Fund
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionThe University of Tokyo

Principal Investigator

HIROSE Keikichi  東京大学, 情報理工学(系)研究科, 教授 (50111472)

Co-Investigator(Kenkyū-buntansha) MINEMATSU Nobuaki  東京大学, 大学院工学系研究科, 教授 (90273333)
SAITO Daisuke  東京大学, 大学院工学系研究科, 助教 (40615150)
Project Period (FY) 2012-04-01 – 2015-03-31
Project Status Completed (Fiscal Year 2014)
Budget Amount *help
¥17,810,000 (Direct Cost: ¥13,700,000、Indirect Cost: ¥4,110,000)
Fiscal Year 2014: ¥5,460,000 (Direct Cost: ¥4,200,000、Indirect Cost: ¥1,260,000)
Fiscal Year 2013: ¥5,590,000 (Direct Cost: ¥4,300,000、Indirect Cost: ¥1,290,000)
Fiscal Year 2012: ¥6,760,000 (Direct Cost: ¥5,200,000、Indirect Cost: ¥1,560,000)
Keywords基本周波数パターン / 生成過程モデル / 統計的音声合成 / 韻律制御 / 音声変換 / 談話の焦点 / マルチストリーム学習 / 行列変量GMM / HMM音声合成 / Deep Neural Network / マルチストリーム / 統計モデリング / 声質変換 / 焦点制御 / 中国語音声 / 声調核モデル
Outline of Final Research Achievements

Research works were conducted with the aim of realizing flexible control of prosody and better speech quality in statistical-based speech synthesis by applying constraints of the generation process model of fundamental frequency (F0) contours. Several methods were developed including one to use F0 contours approximated by the model for HMM training. In the method, hierarchical F0 contours based on the model were treated separately by the multi-stream scheme, leading to a better prosody control keeping clear relations with linguistic information. Lexical emphasis was realized by manipulating the model commands (prosody conversion). Better speaker conversions were realized in multi-speaker case through matrix-variate Gaussian mixture model and deep neural network with speaker-dependent sub-networks. Research works were conducted also for Chinese, with preliminary experiments on speech translation.

Report

(4 results)
  • 2014 Annual Research Report   Final Research Report ( PDF )
  • 2013 Annual Research Report
  • 2012 Annual Research Report
  • Research Products

    (37 results)

All 2015 2014 2013 2012

All Journal Article (18 results) (of which Peer Reviewed: 16 results,  Acknowledgement Compliant: 3 results) Presentation (18 results) (of which Invited: 5 results) Book (1 results)

  • [Journal Article] Automatic Estimation of Parameters of the Generation Process Model and Its Use for HMM-Based Speech Synthesis2015

    • Author(s)
      橋本浩弥, 齋藤大輔, 峯松信明, 広瀬啓吉
    • Journal Title

      電子情報通信学会論文誌D 情報・システム

      Volume: J98-D Issue: 3 Pages: 481-491

    • DOI

      10.14923/transinfj.2014PDP0030

    • ISSN
      1880-4535, 1881-0225
    • Year and Date
      2015-03-01
    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Control of Prosodic Focus Based on Command Differences in Generation Process Model of Fundamental Frequency Contours2015

    • Author(s)
      越智景子, 広瀬啓吉, 峯松信明
    • Journal Title

      電子情報通信学会論文誌D 情報・システム

      Volume: J98-D Issue: 3 Pages: 524-533

    • DOI

      10.14923/transinfj.2014JDP7084

    • ISSN
      1880-4535, 1881-0225
    • Year and Date
      2015-03-01
    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

    • Author(s)
      Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, Xiaoying Xu
    • Journal Title

      Proceedings of International Conference on Speech Prosody

      Volume: 1 Pages: 1032-1036

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014

    • Author(s)
      Tomoyuki Mizukami, Hiroya Hashimoto, Keikichi Hirose, Daisuke Saito, and Nobuaki Minematsu
    • Journal Title

      Proceedings of International Conference on Speech Prosody

      Volume: 1 Pages: 1042-1046

    • Related Report
      2014 Annual Research Report 2013 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Robust pitch estimation using ensemble empirical mode decomposition2014

    • Author(s)
      Sujan Kumar Roy, Md. Khademul Islam Molla, Keikichi Hirose
    • Journal Title

      Proceedings of International Conference on Speech Prosody

      Volume: 1 Pages: 534-538

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Application of matrix variate Gaussian mixture model to statistical voice conversion2014

    • Author(s)
      Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose
    • Journal Title

      Proceedings INTERSPEECH 2014

      Volume: 1 Pages: 2504-2508

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceeedings of Forum Acusticum

      Volume: 1 Pages: 1-6

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Tensor representation for speaker characteristics in speech2014

    • Author(s)
      Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
    • Journal Title

      Proceeedings of Forum Acusticum

      Volume: 1 Pages: 1-5

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014

    • Author(s)
      Keikichi Hirose, Hiroya Hashimoto, Kyota Hyakutake, Daisuke Saito, Nobuaki Minematsu
    • Journal Title

      Proceedings IEEE International Conference on Signal Processing

      Volume: 1 Pages: 555-560

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Journal Article] Voice conversion based on matrix variate gaussian mixture model2014

    • Author(s)
      Daisuke Saito, H. Doi, Nobuaki Minematsu, Keikichi Hirose
    • Journal Title

      Proceedings IEEE International Conference on Signal Processing

      Volume: 1 Pages: 567-576

    • Related Report
      2014 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

    • Author(s)
      Ya Li, Jianhua Tao, Keikichi Hirose, Wei Lai, and Xiaoying Xu
    • Journal Title

      Proceedings of International Conference on Speech Prosody

      Volume: 1 Pages: 1032-1036

    • Related Report
      2013 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings of International Symposium on Frontiers of Research on Speech and Music

      Volume: 1 Pages: 96-100

    • Related Report
      2013 Annual Research Report
  • [Journal Article] Generation of fundamental frequency contours for Thai speech using the tone nucleus model2013

    • Author(s)
      Oraphan Krityakien, Keikichi Hirose, and Nobuaki Minematsu
    • Journal Title

      Journal of Signal Processing, Research Institute of Signal Processing

      Volume: 16 Pages: 135-138

    • NAID

      130004849292

    • Related Report
      2013 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013

    • Author(s)
      Hiroya Hashimoto, Keikichi Hirose and Nobuaki Minematsu
    • Journal Title

      Proceedings 8th ISCA Workshop on Speech Synthesis

      Volume: 1 Pages: 35-39

    • Related Report
      2013 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Toward flexible and systematic control of fundamental frequencies in HMM-based speech synthesis2013

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Journal of English Phonetics Society of Japan

      Volume: 18 Pages: 121-128

    • Related Report
      2013 Annual Research Report
  • [Journal Article] Applying generation process model constraint to fundamental frequency contours generated by hidden- Markov-model-based speech synthesis2012

    • Author(s)
      Tatsuya Matsuda, Keikichi Hirose, and Nobuaki Minematsu
    • Journal Title

      Acoustical Science and Technology, Acoustical Society of Japan

      Volume: 33 Pages: 221-228

    • NAID

      130001853341

    • Related Report
      2012 Annual Research Report
    • Peer Reviewed
  • [Journal Article] A method for generation of Mandarin FO contours based on tone nucleus model and superpositional model2012

    • Author(s)
      Qinghua Sun, Keikichi Hirose, and Nobuaki Minematsu
    • Journal Title

      Speech Communication

      Volume: 54 Pages: 932-945

    • Related Report
      2012 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based2012

    • Author(s)
      Hiroya Hashimoto, Keikichi Hirose, and Nobuaki Minematsu
    • Journal Title

      Proceedings INTERSPEECH

      Volume: CD Pages: 1-4

    • Related Report
      2012 Annual Research Report
    • Peer Reviewed
  • [Presentation] 生成過程モデルによる基本周波数パターンの階層表現とHMM音声合成のマルチストリーム学習2015

    • Author(s)
      島田智大
    • Organizer
      日本音響学会春季講演会
    • Place of Presentation
      中央大学, 文京区, 東京
    • Year and Date
      2015-03-16 – 2015-03-18
    • Related Report
      2014 Annual Research Report
  • [Presentation] 複数出力サブネットワークを有するディープニューラルネットワークに基づく声質変換2014

    • Author(s)
      橋本哲弥
    • Organizer
      電子情報通信学会音声研究会
    • Place of Presentation
      東京工業大学(すずかけ台), 横浜市
    • Year and Date
      2014-12-15 – 2014-12-16
    • Related Report
      2014 Annual Research Report
  • [Presentation] Use of fundamental frequencies shaped by generation process model for HMM-based speech synthesis2014

    • Author(s)
      Keikichi Hirose
    • Organizer
      IEEE International Conference on Signal Processing
    • Place of Presentation
      Hangzhou, China
    • Year and Date
      2014-10-19 – 2014-10-23
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] Voice conversion based on matrix variate gaussian mixture model2014

    • Author(s)
      Daisuke Saito
    • Organizer
      IEEE International Conference on Signal Processing
    • Place of Presentation
      Hangzhou, China
    • Year and Date
      2014-10-19 – 2014-10-23
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] Application of matrix variate Gaussian mixture model to statistical voice conversion2014

    • Author(s)
      Daisuke Saito
    • Organizer
      INTERSPEECH 2014
    • Place of Presentation
      Changi, Singapore
    • Year and Date
      2014-09-14 – 2014-09-18
    • Related Report
      2014 Annual Research Report
  • [Presentation] Use of generation process model for controlling fundamental frequencies in HMM-based speech synthesis2014

    • Author(s)
      Keikichi Hirose
    • Organizer
      Forum Acusticum 2014
    • Place of Presentation
      Krakow, Poland
    • Year and Date
      2014-09-07 – 2014-09-12
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] Tensor representation for speaker characteristics in speech2014

    • Author(s)
      Daisuke Saito
    • Organizer
      Forum Acusticum 2014
    • Place of Presentation
      Krakow, Poland
    • Year and Date
      2014-09-07 – 2014-09-12
    • Related Report
      2014 Annual Research Report
    • Invited
  • [Presentation] 話者依存サブネットワークを用いた深層学習による多対一声質変換2014

    • Author(s)
      橋本哲哉
    • Organizer
      日本音響学会秋季講演会
    • Place of Presentation
      北海学園大学, 札幌市
    • Year and Date
      2014-09-03 – 2014-09-05
    • Related Report
      2014 Annual Research Report
  • [Presentation] 行列変量正規分布の混合モデルとその声質変換への応用2014

    • Author(s)
      齋藤大輔
    • Organizer
      情報処理学会音声言語情報処理研究会
    • Place of Presentation
      ホテル花巻, 花巻市
    • Year and Date
      2014-07-24 – 2014-07-26
    • Related Report
      2014 Annual Research Report
  • [Presentation] Hierarchical stress generation with Fujisaki model in expressive speech synthesis2014

    • Author(s)
      Ya Li
    • Organizer
      International Conference on Speech Prosody
    • Place of Presentation
      Dublin, Ireland
    • Year and Date
      2014-05-20 – 2014-05-23
    • Related Report
      2014 Annual Research Report 2013 Annual Research Report
  • [Presentation] Selection of training data for HMM-based speech synthesis from prosodic features - Use of generation process model of fundamental frequency contours -2014

    • Author(s)
      Tomoyuki Mizukami
    • Organizer
      International Conference on Speech Prosody
    • Place of Presentation
      Dublin, Ireland
    • Year and Date
      2014-05-20 – 2014-05-23
    • Related Report
      2014 Annual Research Report 2013 Annual Research Report
  • [Presentation] Robust pitch estimation using ensemble empirical mode decomposition2014

    • Author(s)
      Sujan Kumar Roy
    • Organizer
      International Conference on Speech Prosod
    • Place of Presentation
      Dublin, Ireland
    • Year and Date
      2014-05-20 – 2014-05-23
    • Related Report
      2014 Annual Research Report
  • [Presentation] Control of fundamental frequencies in HMM-based speech synthesis using generation process model2014

    • Author(s)
      Keikichi Hirose
    • Organizer
      International Symposium on Frontiers of Research on Speech and Music
    • Place of Presentation
      Mysore, India
    • Related Report
      2013 Annual Research Report
    • Invited
  • [Presentation] 生成過程モデルにおけるF0 パターン差分を考慮したHMM音声合成の実験的検討2014

    • Author(s)
      百武恭汰
    • Organizer
      日本音響学会全国大会
    • Place of Presentation
      日本大学, 東京
    • Related Report
      2013 Annual Research Report
  • [Presentation] 行列変量ガウス混合分布に基づく声質変換の検討2014

    • Author(s)
      土井秀信
    • Organizer
      日本音響学会全国大会
    • Place of Presentation
      日本大学, 東京
    • Related Report
      2013 Annual Research Report
  • [Presentation] Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model2013

    • Author(s)
      Keikichi Hirose
    • Organizer
      INTERSPEECH 2013
    • Place of Presentation
      Lyon, France
    • Related Report
      2013 Annual Research Report
  • [Presentation] Context labels based on "bunsetsu" for HMM-based speech synthesis of Japanese2013

    • Author(s)
      Hiroya Hashimoto
    • Organizer
      8th ISCA Workshop on Speech Synthesis
    • Place of Presentation
      Barcelona, Spein
    • Related Report
      2013 Annual Research Report
  • [Presentation] Use of generation process model for synthesizing fundamental frequency contours in HMM-based speech synthesis2012

    • Author(s)
      Keikichi Hirose
    • Organizer
      IEEE International Conference on Signal Processing
    • Place of Presentation
      北京, 中国(招待講演)
    • Year and Date
      2012-10-22
    • Related Report
      2012 Annual Research Report
  • [Book] Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis2015

    • Author(s)
      Keikichi Hirose, Jinhua Tao (editors)
    • Total Pages
      213
    • Publisher
      Springer-Verlag
    • Related Report
      2014 Annual Research Report

URL: 

Published: 2012-04-24   Modified: 2019-07-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi