• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Synthesis of speech in any speaking styles based on corpus-based generation of prosodic features using the generation process model

Research Project

Project/Area Number 17300055
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Perception information processing/Intelligent robotics
Research InstitutionThe University of Tokyo

Principal Investigator

HIROSE Keikichi  The University of Tokyo, Graduate School of Information Science and Technology, Professor (50111472)

Co-Investigator(Kenkyū-buntansha) MINEMATSU Nobuaki  The University of Tokyo, Graduate Frontier Sciences, Associate Professor (90273333)
Project Period (FY) 2005 – 2007
Project Status Completed (Fiscal Year 2007)
Budget Amount *help
¥16,860,000 (Direct Cost: ¥15,300,000、Indirect Cost: ¥1,560,000)
Fiscal Year 2007: ¥6,760,000 (Direct Cost: ¥5,200,000、Indirect Cost: ¥1,560,000)
Fiscal Year 2006: ¥5,100,000 (Direct Cost: ¥5,100,000)
Fiscal Year 2005: ¥5,000,000 (Direct Cost: ¥5,000,000)
KeywordsGeneration process model / Fundamental frequency contour / Corpus-based method / Prosodic control / Speaking style / HMM speech synthesis / Focus control / Spoken dialogue system / コーパスベース韻律制御 / 発話焦点 / 2段階処理 / 感情の程度 / 統計的手法 / 2段階手法 / 音声コーパス / 種々の調子 / 自動推定 / 感情 / アクセント属性
Research Abstract

Research works were conducted to establish a corpus-based speech synthesis method, which is based on generation process model of fundamental frequency contours and can generate high-quality speech in any speaking styles. The original research plan was fulfilled with the following results :
1. A method was developed to predict the command parameters of the generation process model using binary decision trees with inputs such as linguistic information available by parsing texts, and thus to synthesize fundamental frequency contours. An integrated method of prosodic control was realized by integrating the above method with other methods using binary decision trees to predict pause positions and lengths and phoneme durations. The validity of the method was shown through experiments on speech synthesis of various styles including emotional speech. A method was also developed to automatically extract the command parameters from observed fundamental frequency contours using binary decision tre … More es. It was shown that the accuracy of extraction increased by including linguistic information of the text into inputs of the trees.
2. Binary decision trees were constructed to predict deviations in phrase and accent commands of the utterances with specific focuses from those without. Their inputs are accent types and positions in sentences of the focused words, and command values of the corresponding parts of the utterances without specific focus. An appropriate focus control was realized by modifying the phrase and accent commands predicted by the method in section 1 based on the predicted deviations.
3. A two-step method was developed for generating fundamental frequency contours of Standard Chinese. It first generates phrase components in a corpus-based way, and then generates tone components in a corpus-based way. The method has a high flexibility in synthesizing fundamental frequency contours. As an example of flexible control, it was shown that proper focus control could be realized in a simple set of rules.
4. Speech synthesis systems were constructed for Japanese and Chinese by integrating methods developed in sections 1 and 2 above with HMM speech synthesis. It was shown that synthetic speech with higher natural ness could be realized by our system than using "full" HMM synthesizer, where prosodic control was done in HMM framework. It was also shown that various styles of synthetic speech could be realized by our system.
5. Spoken dialogue systems for road guidance and TV program guidance were constructed using the above speech synthesis systems. The validity of the developed speech synthesis method was proved through experiments on the control of speaking styles of reply speech depending on the user's characters and situations. Less

Report

(4 results)
  • 2007 Annual Research Report   Final Research Report Summary
  • 2006 Annual Research Report
  • 2005 Annual Research Report
  • Research Products

    (57 results)

All 2008 2007 2006 2005

All Journal Article (48 results) (of which Peer Reviewed: 9 results) Presentation (6 results) Book (3 results)

  • [Journal Article] Speech prosody in spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Journal of Signal Processing 12

      Pages: 7-16

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Annual Research Report 2007 Final Research Report Summary
  • [Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008

    • Author(s)
      Qinghua Sun
    • Journal Title

      Proceedings of International Conference on Speech Prosody 1

      Pages: 95-98

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Speech prosody in spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Journal of Signal Processing Vol.12, No.1

      Pages: 7-16

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008

    • Author(s)
      Qinghua Sun, Keikichi Hirose, Nobuaki Minematsu
    • Journal Title

      Proceedings of International Conference on Speech Prosody, Campinas Vol.1

      Pages: 95-98

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] 道案内音声対話システムへの概念音声合成に基づく応答生成手法の実装とその評価2007

    • Author(s)
      八木裕司
    • Journal Title

      情報処理学会論文誌 48

      Pages: 3300-3308

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on super positional and tone nucleus models2007

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Archives of Acoustics 32

      Pages: 41-50

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Realization of concept-to-speech conversion for reply speech generation in a spoken dialogue system of road guidance and its evaluation2007

    • Author(s)
      Yuji Yagi, Seiya Takada, Keikichi Hirose, Nobuaki Minematsu
    • Journal Title

      IPSJ (Information Processing Society of Japan) Journal vol.48, no.9

      Pages: 3300-3308

    • NAID

      110006423006

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models2007

    • Author(s)
      Keikichi Hirose, Qinghua Sun, Nobuaki Minematsu
    • Journal Title

      Archives of Acoustics Vol.32, No.1

      Pages: 41-50

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] 道案内音声対話システムへの概念音声合成に基づく応答生成手法の実装とその評価2007

    • Author(s)
      八木 裕司
    • Journal Title

      情報処理学会論文誌 48

      Pages: 3300-3308

    • Related Report
      2007 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Corpus-based generation of prosodic features from text based on generation process model2007

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings of Interspeech 1(CD-ROM)

      Pages: 1274-1277

    • Related Report
      2007 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models2007

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Archives of Acoustics 32・1

      Pages: 41-50

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Prosody in spoken language technologies(Special Lecture)2007

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings of International Workshop on Nonlinear Circuits and Signal Processing(NCSP2007) CD-ROM

      Pages: 615-622

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Generation of sentence F0 contours for Mandarin speech synthesis by superposing tone components on phrase components2007

    • Author(s)
      Qinghua Sun
    • Journal Title

      Proceedings of International Workshop on Nonlinear Circuits and Signal Processing(NCSP2007) CD-ROM

      Pages: 317-320

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 日本語音声合成を目的としたアクセント処理のための規則と統計的学習2007

    • Author(s)
      黒岩龍
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 301-302

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 道案内対話システムにおける応答音声の評価2007

    • Author(s)
      八木裕司
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 9-10

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 基本周波数パターン生成過程モデルを用いたテキストからのコーパスベース韻律生成とその評価2007

    • Author(s)
      越智景子
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 231-232

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

    • Author(s)
      Jinfu Ni
    • Journal Title

      Speech Communication 48

      Pages: 989-1008

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006

    • Author(s)
      Jinfu Ni
    • Journal Title

      Journal of Acoustical Society of America 119

      Pages: 1764-1782

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings Interspeech 2006 1

      Pages: 305-308

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

    • Author(s)
      Jinfu Ni, Keikichi Hirose
    • Journal Title

      Speech Communication Vol.48, No.8

      Pages: 989-1008

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006

    • Author(s)
      Jinfu Ni, Hisashi Kawai, Keikichi Hirose
    • Journal Title

      Journal of Acoustical Society of America Vol.119, No.3

      Pages: 1764-1782

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

    • Author(s)
      Keikichi Hirose, Yasufumi Asano, Nobuaki Minematsu
    • Journal Title

      Proceedings Interspeech 2006, Pittsburgh Vol.1

      Pages: 305-308

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

    • Author(s)
      Jinfu Ni
    • Journal Title

      Speech Communication 62・5

      Pages: 370-378

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Modeling the effects of emphasis and question on fundamental frequency contours of Cantonese utterances2006

    • Author(s)
      Wentao Gu
    • Journal Title

      IEEE Transactions on Speech and Audio Processing 14・4

      Pages: 1155-1170

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Corpus-based synthesis of fundamental frequency contours using generation process model and automatic preparation of training corpora(Plenary)2006

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Conference Guide for Oriental COCOSDA 2006 - International Conference on Speech Databases and Assessment

      Pages: 12-12

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Rule-based generation of phrase components in two-step synthesis of fundamental frequency contours of Mandarin2006

    • Author(s)
      Qinghua Sun
    • Journal Title

      Proceedings of International Conference on Speech Prosody 2

      Pages: 561-564

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings of Interspeech 2006-ICSLP CD-ROM

      Pages: 305-308

    • Related Report
      2006 Annual Research Report
  • [Journal Article] 生成過程モデルによるコーパスベース韻律制御のための休止推定2006

    • Author(s)
      越智景子
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 269-270

    • Related Report
      2006 Annual Research Report
  • [Journal Article] Modeling and generation of prosodic features2006

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Spoken Language Systems (Ohm-sha)

      Pages: 73-86

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

    • Author(s)
      Jinfu Ni
    • Journal Title

      Speech Communication (発表予定)

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Rule-based generation of phrase components in two-step synthesis of fundamental frequency contours of Mandarin2006

    • Author(s)
      Quinghua Sun
    • Journal Title

      Proc. International Conference on Speech Prosody 発表予定

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 種々の調子の音声合成のための韻律制御-感情音声合成の視点から-(招待講演)2006

    • Author(s)
      広瀬 啓吉
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 279-282

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 文節単位での感情の程度を考慮した統計的韻律制御2006

    • Author(s)
      浅野 泰史
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 213-214

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 基本周波数パターン生成過程モデルのコーパスベースパラメータ自動抽出の評価2006

    • Author(s)
      河村 美由紀
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 387-388

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 音調核モデルに基づく中国語F_0パターンの2段階生成2006

    • Author(s)
      孫慶華
    • Journal Title

      電子情報通信学会技術研究報告(音声研究会) SP2005-159

      Pages: 55-60

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Speech Communication 48

      Pages: 385-404

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proceedings Inerspeech 2005 1

      Pages: 3257-3260

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora : Application to emotional speech synthesis2005

    • Author(s)
      Keikichi Hirose, Kentaro Sato, Yasufumi Asano, Nobuaki Minematsu
    • Journal Title

      Speech Communication Vol.46, Nos.3-4

      Pages: 385-404

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

    • Author(s)
      Keikichi Hirose, Yusuke Furuyama, Nobuaki Minematsu
    • Journal Title

      Proceedings Inerspeech 2005, Lisbon Vol.1

      Pages: 3257-3260

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Prosody generation based on generation process model2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Prosody in spoken language information processing -Science of accent, intonation, and rhythm- (218 pages), Edited by K. Hirose, Maruzen

      Pages: 109-118

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] Modeling and generation of prosodic features2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Spoken Language Systems (347 pages), Edited by S. Nakagawa, M. Okada, and T. Kawahara, Ohm-sha

      Pages: 73-86

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] 柔軟な音声合成2005

    • Author(s)
      広瀬啓吉
    • Journal Title

      パートナーロボット資料集成(エヌ・ティー・エス) 2章1節

      Pages: 58-67

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Speech Communication 46・3-4

      Pages: 385-404

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proc. 9^<th> European Conference on Speech Communication and Technology (INTERSPEECH) CD-ROM

      Pages: 3257-3260

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model2005

    • Author(s)
      Quinghua Sun
    • Journal Title

      Proc. 9^<th> European Conference on Speech Communication and Technology (INTERSPEECH) CD-ROM

      Pages: 3625-3628

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Estimation of intonation variation with constrained tone transformations2005

    • Author(s)
      Jinfu Ni
    • Journal Title

      Proc. 9^<th> European Conference on Speech Communication and Technology (INTERSPEECH) CD-ROM

      Pages: 1397-1400

    • Related Report
      2005 Annual Research Report
  • [Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models (Plenary Talk)2005

    • Author(s)
      Keikichi Hirose
    • Journal Title

      Proc. Speech Analysis, Synthesis and Recognition -Application of Phonetics- CD-ROM

    • Related Report
      2005 Annual Research Report
  • [Journal Article] 日本語テキスト音声合成用アクセント結合規則の改良2005

    • Author(s)
      黒岩 龍
    • Journal Title

      日本音響学会講演論文集 CD-ROM

      Pages: 427-428

    • Related Report
      2005 Annual Research Report
  • [Presentation] Generation of F_0 contours for Mandarin speech in combination with rule-based and corpus-based methods2008

    • Author(s)
      Keikichi Hirose, Qinghua Sun, Nobuaki Minematsu
    • Organizer
      8th Phonetics Conference of China/International Symposium on Phonetic Frontiers
    • Place of Presentation
      Beijing (Invited)
    • Year and Date
      2008-04-19
    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Presentation] 基本周波数パターン生成過程モデルに基づくコーパスベース韻律制御における焦点制御2008

    • Author(s)
      越智 景子
    • Organizer
      日本音響学会
    • Place of Presentation
      千葉工業大学
    • Year and Date
      2008-03-18
    • Related Report
      2007 Annual Research Report
  • [Presentation] Researches on speech prosody for advanced spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Organizer
      Proceedings of International Symposium: Frontiers of Research on Speech and Music
    • Place of Presentation
      Kolkata
    • Year and Date
      2008-02-20
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Presentation] Researches on speech prosody for advanced spoken language technologies2008

    • Author(s)
      Keikichi Hirose
    • Organizer
      International Symposium : Frontiers of Research on Speech and Music
    • Place of Presentation
      Kolkata (Invited)
    • Year and Date
      2008-02-20
    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Presentation] Corpus-based generation of prosodic features from text based on generation process model2007

    • Author(s)
      Keikichi Hirose
    • Organizer
      Interspeech 2007
    • Place of Presentation
      Antwerp
    • Year and Date
      2007-08-28
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Presentation] Corpus-based synthesis of fundamental frequency contours using generation process model and automatic preparation of training corpora2006

    • Author(s)
      Keikichi Hirose
    • Organizer
      International Conference on Speech Databases and Assessment
    • Place of Presentation
      Penang (Keynote)
    • Year and Date
      2006-12-09
    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Book] 韻律と音声言語情報処理-アクセント・イントネーション・リズムの科学-2006

    • Author(s)
      広瀬 啓吉(編著)
    • Total Pages
      226
    • Publisher
      丸善
    • Related Report
      2005 Annual Research Report
  • [Book] 韻律と音声言語情報処理-アクセント・イントネーション・リズムの科学-2005

    • Author(s)
      広瀬啓吉(編著)
    • Total Pages
      218
    • Publisher
      丸善
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Book] Spoken Language Systems (Modeling and generation of prosodic features)2005

    • Author(s)
      Keikichi Hirose
    • Publisher
      オーム社
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary

URL: 

Published: 2005-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi