2007 Fiscal Year Final Research Report Summary

Synthesis of speech in any speaking styles based on corpus-based generation of prosodic features using the generation process model

Research Project

Project/Area Number	17300055
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Perception information processing/Intelligent robotics
Research Institution	The University of Tokyo
Principal Investigator	HIROSE Keikichi The University of Tokyo, Graduate School of Information Science and Technology, Professor (50111472)
Co-Investigator(Kenkyū-buntansha)	MINEMATSU Nobuaki The University of Tokyo, Graduate Frontier Sciences, Associate Professor (90273333)
Project Period (FY)	2005 – 2007
Keywords	Generation process model / Fundamental frequency contour / Corpus-based method / Prosodic control / Speaking style / HMM speech synthesis / Focus control / Spoken dialogue system
Research Abstract	Research works were conducted to establish a corpus-based speech synthesis method, which is based on generation process model of fundamental frequency contours and can generate high-quality speech in any speaking styles. The original research plan was fulfilled with the following results : 1. A method was developed to predict the command parameters of the generation process model using binary decision trees with inputs such as linguistic information available by parsing texts, and thus to synthesize fundamental frequency contours. An integrated method of prosodic control was realized by integrating the above method with other methods using binary decision trees to predict pause positions and lengths and phoneme durations. The validity of the method was shown through experiments on speech synthesis of various styles including emotional speech. A method was also developed to automatically extract the command parameters from observed fundamental frequency contours using binary decision tre … More es. It was shown that the accuracy of extraction increased by including linguistic information of the text into inputs of the trees. 2. Binary decision trees were constructed to predict deviations in phrase and accent commands of the utterances with specific focuses from those without. Their inputs are accent types and positions in sentences of the focused words, and command values of the corresponding parts of the utterances without specific focus. An appropriate focus control was realized by modifying the phrase and accent commands predicted by the method in section 1 based on the predicted deviations. 3. A two-step method was developed for generating fundamental frequency contours of Standard Chinese. It first generates phrase components in a corpus-based way, and then generates tone components in a corpus-based way. The method has a high flexibility in synthesizing fundamental frequency contours. As an example of flexible control, it was shown that proper focus control could be realized in a simple set of rules. 4. Speech synthesis systems were constructed for Japanese and Chinese by integrating methods developed in sections 1 and 2 above with HMM speech synthesis. It was shown that synthetic speech with higher natural ness could be realized by our system than using "full" HMM synthesizer, where prosodic control was done in HMM framework. It was also shown that various styles of synthetic speech could be realized by our system. 5. Spoken dialogue systems for road guidance and TV program guidance were constructed using the above speech synthesis systems. The validity of the developed speech synthesis method was proved through experiments on the control of speaking styles of reply speech depending on the user's characters and situations. Less

Research Products
(27 results)

All 2008 2007 2006 2005

All Journal Article (20 results) (of which Peer Reviewed: 7 results) Presentation (5 results) Book (2 results)

[Journal Article] Speech prosody in spoken language technologies2008
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Journal of Signal Processing 12
  
  Pages: 7-16
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008
- Author(s)
  Qinghua Sun
- Journal Title
  
  Proceedings of International Conference on Speech Prosody 1
  
  Pages: 95-98
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Speech prosody in spoken language technologies2008
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Journal of Signal Processing Vol.12, No.1
  
  Pages: 7-16
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008
- Author(s)
  Qinghua Sun, Keikichi Hirose, Nobuaki Minematsu
- Journal Title
  
  Proceedings of International Conference on Speech Prosody, Campinas Vol.1
  
  Pages: 95-98
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] 道案内音声対話システムへの概念音声合成に基づく応答生成手法の実装とその評価2007
- Author(s)
  八木裕司
- Journal Title
  
  情報処理学会論文誌 48
  
  Pages: 3300-3308
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on super positional and tone nucleus models2007
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Archives of Acoustics 32
  
  Pages: 41-50
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Realization of concept-to-speech conversion for reply speech generation in a spoken dialogue system of road guidance and its evaluation2007
- Author(s)
  Yuji Yagi, Seiya Takada, Keikichi Hirose, Nobuaki Minematsu
- Journal Title
  
  IPSJ (Information Processing Society of Japan) Journal vol.48, no.9
  
  Pages: 3300-3308
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models2007
- Author(s)
  Keikichi Hirose, Qinghua Sun, Nobuaki Minematsu
- Journal Title
  
  Archives of Acoustics Vol.32, No.1
  
  Pages: 41-50
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006
- Author(s)
  Jinfu Ni
- Journal Title
  
  Speech Communication 48
  
  Pages: 989-1008
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006
- Author(s)
  Jinfu Ni
- Journal Title
  
  Journal of Acoustical Society of America 119
  
  Pages: 1764-1782
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceedings Interspeech 2006 1
  
  Pages: 305-308
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006
- Author(s)
  Jinfu Ni, Keikichi Hirose
- Journal Title
  
  Speech Communication Vol.48, No.8
  
  Pages: 989-1008
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006
- Author(s)
  Jinfu Ni, Hisashi Kawai, Keikichi Hirose
- Journal Title
  
  Journal of Acoustical Society of America Vol.119, No.3
  
  Pages: 1764-1782
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006
- Author(s)
  Keikichi Hirose, Yasufumi Asano, Nobuaki Minematsu
- Journal Title
  
  Proceedings Interspeech 2006, Pittsburgh Vol.1
  
  Pages: 305-308
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis2005
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Speech Communication 48
  
  Pages: 385-404
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Proceedings Inerspeech 2005 1
  
  Pages: 3257-3260
- Description
  「研究成果報告書概要(和文)」より
- Peer Reviewed
[Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora : Application to emotional speech synthesis2005
- Author(s)
  Keikichi Hirose, Kentaro Sato, Yasufumi Asano, Nobuaki Minematsu
- Journal Title
  
  Speech Communication Vol.46, Nos.3-4
  
  Pages: 385-404
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005
- Author(s)
  Keikichi Hirose, Yusuke Furuyama, Nobuaki Minematsu
- Journal Title
  
  Proceedings Inerspeech 2005, Lisbon Vol.1
  
  Pages: 3257-3260
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Prosody generation based on generation process model2005
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Prosody in spoken language information processing -Science of accent, intonation, and rhythm- (218 pages), Edited by K. Hirose, Maruzen
  
  Pages: 109-118
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Modeling and generation of prosodic features2005
- Author(s)
  Keikichi Hirose
- Journal Title
  
  Spoken Language Systems (347 pages), Edited by S. Nakagawa, M. Okada, and T. Kawahara, Ohm-sha
  
  Pages: 73-86
- Description
  「研究成果報告書概要(欧文)」より
[Presentation] Generation of F_0 contours for Mandarin speech in combination with rule-based and corpus-based methods2008
- Author(s)
  Keikichi Hirose, Qinghua Sun, Nobuaki Minematsu
- Organizer
  8th Phonetics Conference of China/International Symposium on Phonetic Frontiers
- Place of Presentation
  Beijing (Invited)
- Year and Date
  2008-04-19
- Description
  「研究成果報告書概要(欧文)」より
[Presentation] Researches on speech prosody for advanced spoken language technologies2008
- Author(s)
  Keikichi Hirose
- Organizer
  Proceedings of International Symposium: Frontiers of Research on Speech and Music
- Place of Presentation
  Kolkata
- Year and Date
  2008-02-20
- Description
  「研究成果報告書概要(和文)」より
[Presentation] Researches on speech prosody for advanced spoken language technologies2008
- Author(s)
  Keikichi Hirose
- Organizer
  International Symposium : Frontiers of Research on Speech and Music
- Place of Presentation
  Kolkata (Invited)
- Year and Date
  2008-02-20
- Description
  「研究成果報告書概要(欧文)」より
[Presentation] Corpus-based generation of prosodic features from text based on generation process model2007
- Author(s)
  Keikichi Hirose
- Organizer
  Interspeech 2007
- Place of Presentation
  Antwerp
- Year and Date
  2007-08-28
- Description
  「研究成果報告書概要(和文)」より
[Presentation] Corpus-based synthesis of fundamental frequency contours using generation process model and automatic preparation of training corpora2006
- Author(s)
  Keikichi Hirose
- Organizer
  International Conference on Speech Databases and Assessment
- Place of Presentation
  Penang (Keynote)
- Year and Date
  2006-12-09
- Description
  「研究成果報告書概要(欧文)」より
[Book] 韻律と音声言語情報処理-アクセント・イントネーション・リズムの科学-2005
- Author(s)
  広瀬啓吉(編著)
- Total Pages
  218
- Publisher
  丸善
- Description
  「研究成果報告書概要(和文)」より
[Book] Spoken Language Systems (Modeling and generation of prosodic features)2005
- Author(s)
  Keikichi Hirose
- Total Pages
  347(14頁分担執筆)
- Publisher
  オーム社
- Description
  「研究成果報告書概要(和文)」より

2007 Fiscal Year Final Research Report Summary

Synthesis of speech in any speaking styles based on corpus-based generation of prosodic features using the generation process model

Principal Investigator

HIROSE Keikichi The University of Tokyo, Graduate School of Information Science and Technology, Professor (50111472)

Research Products

[Journal Article] Speech prosody in spoken language technologies2008

Author(s)

Journal Title

Description

[Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008

Author(s)

Journal Title

Description

[Journal Article] Speech prosody in spoken language technologies2008

Author(s)

Journal Title

Description

[Journal Article] Improved prediction of tone components for F_0 contour generation of Mandarin speech based on tone nucleus model2008

Author(s)

Journal Title

Description

[Journal Article] 道案内音声対話システムへの概念音声合成に基づく応答生成手法の実装とその評価2007

Author(s)

Journal Title

Description

[Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on super positional and tone nucleus models2007

Author(s)

Journal Title

Description

[Journal Article] Realization of concept-to-speech conversion for reply speech generation in a spoken dialogue system of road guidance and its evaluation2007

Author(s)

Journal Title

Description

[Journal Article] Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models2007

Author(s)

Journal Title

Description

[Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

Author(s)

Journal Title

Description

[Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006

Author(s)

Journal Title

Description

[Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

Author(s)

Journal Title

Description

[Journal Article] Quantitative and structural modeling of voice fundamental frequency contours of speech in Mandarin2006

Author(s)

Journal Title

Description

[Journal Article] Constrained tone transformation technique for separation and combination of Mandarin tone and intonation2006

Author(s)

Journal Title

Description

[Journal Article] Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses2006

Author(s)

Journal Title

Description

[Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora: Application to emotional speech synthesis2005

Author(s)

Journal Title

Description

[Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

Author(s)

Journal Title

Description

[Journal Article] Synthesis of F_0 contours using generation process model parameters predicted from unlabeled corpora : Application to emotional speech synthesis2005

Author(s)

Journal Title

Description

[Journal Article] Corpus-based extraction of F_0 contour generation process model parameters2005

Author(s)

Journal Title

Description

[Journal Article] Prosody generation based on generation process model2005

Author(s)

Journal Title