• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2007 Fiscal Year Final Research Report Summary

Synthesis of speech in any speaking styles based on corpus-based generation of prosodic features using the generation process model

Research Project

Project/Area Number 17300055
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionThe University of Tokyo

Principal Investigator

HIROSE Keikichi  The University of Tokyo, Graduate School of Information Science and Technology, Professor (50111472)

Co-Investigator(Kenkyū-buntansha) MINEMATSU Nobuaki  The University of Tokyo, Graduate Frontier Sciences, Associate Professor (90273333)
Project Period (FY) 2005 – 2007
KeywordsGeneration process model / Fundamental frequency contour / Corpus-based method / Prosodic control / Speaking style / HMM speech synthesis / Focus control / Spoken dialogue system
Research Abstract

Research works were conducted to establish a corpus-based speech synthesis method, which is based on generation process model of fundamental frequency contours and can generate high-quality speech in any speaking styles. The original research plan was fulfilled with the following results :
1. A method was developed to predict the command parameters of the generation process model using binary decision trees with inputs such as linguistic information available by parsing texts, and thus to synthesize fundamental frequency contours. An integrated method of prosodic control was realized by integrating the above method with other methods using binary decision trees to predict pause positions and lengths and phoneme durations. The validity of the method was shown through experiments on speech synthesis of various styles including emotional speech. A method was also developed to automatically extract the command parameters from observed fundamental frequency contours using binary decision tre … More es. It was shown that the accuracy of extraction increased by including linguistic information of the text into inputs of the trees.
2. Binary decision trees were constructed to predict deviations in phrase and accent commands of the utterances with specific focuses from those without. Their inputs are accent types and positions in sentences of the focused words, and command values of the corresponding parts of the utterances without specific focus. An appropriate focus control was realized by modifying the phrase and accent commands predicted by the method in section 1 based on the predicted deviations.
3. A two-step method was developed for generating fundamental frequency contours of Standard Chinese. It first generates phrase components in a corpus-based way, and then generates tone components in a corpus-based way. The method has a high flexibility in synthesizing fundamental frequency contours. As an example of flexible control, it was shown that proper focus control could be realized in a simple set of rules.
4. Speech synthesis systems were constructed for Japanese and Chinese by integrating methods developed in sections 1 and 2 above with HMM speech synthesis. It was shown that synthetic speech with higher natural ness could be realized by our system than using "full" HMM synthesizer, where prosodic control was done in HMM framework. It was also shown that various styles of synthetic speech could be realized by our system.
5. Spoken dialogue systems for road guidance and TV program guidance were constructed using the above speech synthesis systems. The validity of the developed speech synthesis method was proved through experiments on the control of speaking styles of reply speech depending on the user's characters and situations. Less

  • Research Products

    (27 results)

All 2008 2007 2006 2005

All Journal Article (20 results) (of which Peer Reviewed: 7 results) Presentation (5 results) Book (2 results)

  • [Journal Article] Speech prosody in spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Journal of Signal Processing 12

      Pages: 7-16

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008

    • Author(s)
      Qinghua Sun
    • Journal Title

      Proceedings of International Conference on Speech Prosody 1

      Pages: 95-98

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Speech prosody in spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Journal of Signal Processing Vol.12, No.1

      Pages: 7-16

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008

    • Author(s)
      Qinghua Sun, Keikichi Hirose, Nobuaki Minematsu
    • Journal Title

      Proceedings of International Conference on Speech Prosody, Campinas Vol.1

      Pages: 95-98

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] 道案内音声対話システムへの概念音声合成に基づく応答生成手法の実装とその評価2007

    • Author(s)
      八木裕司
    • Journal Title

      情報処理学会論文誌 48

      Pages: 3300-3308

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on super positional and tone nucleus models2007

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Archives of Acoustics 32

      Pages: 41-50

    • Description
      「研究成果報告書概要(和文)」より
  • [Journal Article] Realization of concept-to-speech conversion for reply speech generation in a spoken dialogue system of road guidance and its evaluation2007

    • Author(s)
      Yuji Yagi, Seiya Takada, Keikichi Hirose, Nobuaki Minematsu
    • Journal Title

      IPSJ (Information Processing Society of Japan) Journal vol.48, no.9

      Pages: 3300-3308

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models2007

    • Author(s)
      Keikichi Hirose, Qinghua Sun, Nobuaki Minematsu
    • Journal Title

      Archives of Acoustics Vol.32, No.1

      Pages: 41-50

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

    • Author(s)
      Jinfu Ni
    • Journal Title

      Speech Communication 48

      Pages: 989-1008

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006

    • Author(s)
      Jinfu Ni
    • Journal Title

      Journal of Acoustical Society of America 119

      Pages: 1764-1782

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings Interspeech 2006 1

      Pages: 305-308

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

    • Author(s)
      Jinfu Ni, Keikichi Hirose
    • Journal Title

      Speech Communication Vol.48, No.8

      Pages: 989-1008

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006

    • Author(s)
      Jinfu Ni, Hisashi Kawai, Keikichi Hirose
    • Journal Title

      Journal of Acoustical Society of America Vol.119, No.3

      Pages: 1764-1782

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

    • Author(s)
      Keikichi Hirose, Yasufumi Asano, Nobuaki Minematsu
    • Journal Title

      Proceedings Interspeech 2006, Pittsburgh Vol.1

      Pages: 305-308

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Speech Communication 48

      Pages: 385-404

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings Inerspeech 2005 1

      Pages: 3257-3260

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora : Application to emotional speech synthesis2005

    • Author(s)
      Keikichi Hirose, Kentaro Sato, Yasufumi Asano, Nobuaki Minematsu
    • Journal Title

      Speech Communication Vol.46, Nos.3-4

      Pages: 385-404

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

    • Author(s)
      Keikichi Hirose, Yusuke Furuyama, Nobuaki Minematsu
    • Journal Title

      Proceedings Inerspeech 2005, Lisbon Vol.1

      Pages: 3257-3260

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Prosody generation based on generation process model2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Prosody in spoken language information processing -Science of accent, intonation, and rhythm- (218 pages), Edited by K. Hirose, Maruzen

      Pages: 109-118

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Modeling and generation of prosodic features2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Spoken Language Systems (347 pages), Edited by S. Nakagawa, M. Okada, and T. Kawahara, Ohm-sha

      Pages: 73-86

    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Generation of F_0 contours for Mandarin speech in combination with rule-based and corpus-based methods2008

    • Author(s)
      Keikichi Hirose, Qinghua Sun, Nobuaki Minematsu
    • Organizer
      8th Phonetics Conference of China/International Symposium on Phonetic Frontiers
    • Place of Presentation
      Beijing (Invited)
    • Year and Date
      2008-04-19
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Researches on speech prosody for advanced spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Organizer
      Proceedings of International Symposium: Frontiers of Research on Speech and Music
    • Place of Presentation
      Kolkata
    • Year and Date
      2008-02-20
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Researches on speech prosody for advanced spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Organizer
      International Symposium : Frontiers of Research on Speech and Music
    • Place of Presentation
      Kolkata (Invited)
    • Year and Date
      2008-02-20
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Corpus-based generation of prosodic features from text based on generation process model2007

    • Author(s)
      Keikichi Hirose
    • Organizer
      Interspeech 2007
    • Place of Presentation
      Antwerp
    • Year and Date
      2007-08-28
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Corpus-based synthesis of fundamental frequency contours using generation process model and automatic preparation of training corpora2006

    • Author(s)
      Keikichi Hirose
    • Organizer
      International Conference on Speech Databases and Assessment
    • Place of Presentation
      Penang (Keynote)
    • Year and Date
      2006-12-09
    • Description
      「研究成果報告書概要(欧文)」より
  • [Book] 韻律と音声言語情報処理-アクセント・イントネーション・リズムの科学-2005

    • Author(s)
      広瀬啓吉(編著)
    • Total Pages
      218
    • Publisher
      丸善
    • Description
      「研究成果報告書概要(和文)」より
  • [Book] Spoken Language Systems (Modeling and generation of prosodic features)2005

    • Author(s)
      Keikichi Hirose
    • Total Pages
      347(14頁分担執筆)
    • Publisher
      オーム社
    • Description
      「研究成果報告書概要(和文)」より

URL: 

Published: 2010-02-04  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi