2002 Fiscal Year Final Research Report Summary

High-quality Speech Synthesis based on Accurate Analysis Method and Statistical Method

Research Project

Project/Area Number	12480079
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	HIROSE Keikichi Graduate School of Frontier Sciences, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	MINEMATSU Nobuaki Graduate School of Information Science and Technology, Associate Professor, 大学院・情報理工学系研究科, 助教授 (90273333)
Project Period (FY)	2000 – 2002
Keywords	Statistical Speech Synthesis / Terminal Analogue Synthesis / Waveform Concatanative Synthesis / HMM Speech Syntheses / AR-HMM Model / Fundamental Frequency Contour / Generation Process Model / Emotional Speech Synthesis
Research Abstract	The original research plan, which aims at realizing high-quality speech synthesis through utilizing accurate pole-zero information of vocal transfer function for segmental feature generation and applying the functional model constraints for prosodic feature generation, was accomplished with the following results : 1. A successive approximation was applied to ARX analysis enabling accurate pole-zero estimation. The method was combined with our formerly developed terminal analogue synthesizer to construct a analysis-synthesis workbench. Using this, we succeeded to improve the quality of liquid sound. 2. A speech synthesizer, hybrid of terminal analogue and waveform concatenation, was developed. A high-quality speech synthesis was realized. 3. A method was developed for stable formant extraction, which was based on AR-HMM modeling, representing source waveform using HMM. Result of speech synthesis experiment showed that the method could generate high-quality even for a large F0 (fundamental … More frequency) change. 4. By adding natural waveform of junction periods in the spectral domain with appropriate weighting to the concatenated speech, we successfully realized a smooth spectral transition. Also we developed a method to effectively reduce the corpus size for concatenative synthesis by the weighted VQ according to the frequency. 5. The necessary data size for speaker adaptation was investigated form the viewpoint of speech quality after developing a HMM speech synthesizer. It was shown that a good quality was obtainable 10 and more sentences. 6. F0 contour generation was realized by estimating the generation process model parameters using statistical methods. A high speech quality was realized only from a small speech corpus by using linguistic information such as on direct modification relations of words. Also we succeeded to estimate the accent phrase boundaries form text using the same statistical framework. Furthermore, F0 contour generation and phoneme length estimation were realized for emotional speech with a good result. 7. A method for automatically estimating F0 contour generation process model commands was realized. Using the method, a prosodic corpus was made. This corpus is indispensable for the above F0 contour generation. 8. A rule for controlling mora duration for dialogue-like speech synthesis was constructed. The result of the speech synthesis experiment showed the validity of the rule. Less

Research Products
(30 results)

All Other

All Publications (30 results)

[Publications] Keikichi Hirose: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36・1-2. 97-111 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "音声合成研究への招待 -自由な合成の実現に向けて-"情報処理. 43・3. 321-324 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 成澤修一: "音声の基本周波数パターン生成過程モデルのパラメータ自動抽出法"情報処理学会論文誌. 43・7. 2155-2168 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Atsuhiro Sakurai: "Data-driven generation of FO contours using a superpositional model"Speech Communication. (発表予定). (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Nobuyuki Nishizawa: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc. International Conference on Spoken Language Processing. 1. 725-728 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Atsuhiro Sakurai: "Modeling and generation of accentual phrase FO contours based on discrete HMMs synchronized at mora-unit transitions"Proc. International Conference on Spoken Language Processing. 3. 259-262 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Keikichi Hirose: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody 2002. 391-394 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Keikichi Hirose: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Nobuyuki Nishizawa: "Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model"Proc. International Conference on Spoken Language Processing. 3. 1721-1724 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Keikichi Hirose: "Corpus-based synthesis of FO contours for emotional speech using the generation process model"Proceedings 15th International Congress of Phonetic Sciences. (発表予定). (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 西澤信行: "波形編集を併用したフォルマント音声合成"電子情報通信学会技術研究報告(音声研究会). 35-42 (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Nettre Benjamin: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"電子情報通信学会技術研究報告(音声研究会). 53-60 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "音声情報処理におけるパラ・非言語情報"日本音響学会秋季講演論文集. 1. 243-246 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Wentao Gu: "Considerations on acoustic models for HMM-based Mandarin synthesis"日本音響学会春季講演論文集. (発表予定). (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Keikichi Hirose: "Yesterday and for the Spoken Language Researches (Corpus-based synthesis of fundamental frequency contours for TTS systems based on a generation Process model)"TaeHakSa. 680(17) (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Keikichi Hirose and Hiromichi Kawanami: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36, 1-2. 97-111 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose: "Invitation to Speech Synthesis Research -Toward Realization of Flexible Synthesis-"Information Processing Society of Japan Magazine. 43, 3. 321-324 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Shuichi Narusawa, Nobuaki Minematsu, Keikichi Hirose and Hiroya Fujisaki: "A method for automatic extraction of parameters of the fundamental frequency contour generation model"IPSJ (Information Processing Society of Japan) Journal. 43, 7. 2155-2168 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsuhiro Sakurai, Keikichi Hirose and Nobuaki Minematsu: "Data-driven generation of F0 contours using a superpositional model"Speech Communication. to be published. (2003)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Nobuyuki Nishizawa, Nobuaki Minematsu and Keikichi Hirose: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc. International Conference on Spoken Language Processing. 1. 725-728 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsuhiro Sakurai, Koji Iwano and Keikichi Hirose: "Modeling and generation of accentual phrase F0 contours based on discrete HMMs synchronized at mora-unit transitions"Proc. International Conference on Spoken Language Processing. 3. 259-262 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose, Nobuaki Minematsu, and Masaya Eto: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody. 2002. 391-394 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose, Masaya Eto, and Nobuaki Minematsu: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Nobuyuki Nishizawa, Keikichi Hirose, and Nobuaki Minematsu: "Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model"Proc. International Conference on Spoken Language Processing. 3. 1721-1724 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose, Toshiya Katsura, and Nobuaki Minematsu: "Corpus-based synthesis of F0 contours for emotional speech using the generation process model"Proc. 15th International Congress of Phonetic Sciences. to be published. (2003)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Nobuyuki Nishizawa, Nobuaki Minematsu, and Keikichi Hirose: "Formant speech synthesis partly using waveform concatenative synthesis -Experimental study on VCV sounds-"IEICE Technical Report. SP2001-20. 35-42 (2001)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Nettre Benjamin, Keikichi Hirose, and Nobuaki Minematsu: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"IEICE Technical Report. SP2001-121. 53-60 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose: "Para-and non-linguistic information in speech information processing"Record of Fall Meeting, Acoust. Soc. Japan. 1. 243-246 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Wentao Gu, Keikichi Hirose, Nobuaki Minematsu: "Considerations on acoustic models for HMM-based Mandarin synthesis"Record of Spring Meeting, Acoust. Soc. Japan. 1, to be published. (2003)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose, Nobuaki Minematsu and Masaya Eto: "Corpus-based synthesis of fundamental frequency contours for TTS systems based on a generation Process model"Yesterday and Today for the Spoken Language Researches, TaeHakSa. 461-477 (2002)
- Description
  「研究成果報告書概要(欧文)」より

2002 Fiscal Year Final Research Report Summary

High-quality Speech Synthesis based on Accurate Analysis Method and Statistical Method

Principal Investigator

HIROSE Keikichi Graduate School of Frontier Sciences, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)

Research Products

[Publications] Keikichi Hirose: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36・1-2. 97-111 (2002)

Description

[Publications] 広瀬啓吉: "音声合成研究への招待 -自由な合成の実現に向けて-"情報処理. 43・3. 321-324 (2002)

Description

[Publications] 成澤修一: "音声の基本周波数パターン生成過程モデルのパラメータ自動抽出法"情報処理学会論文誌. 43・7. 2155-2168 (2002)

Description

[Publications] Atsuhiro Sakurai: "Data-driven generation of FO contours using a superpositional model"Speech Communication. (発表予定). (2003)

Description

[Publications] Nobuyuki Nishizawa: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc. International Conference on Spoken Language Processing. 1. 725-728 (2000)

Description

[Publications] Atsuhiro Sakurai: "Modeling and generation of accentual phrase FO contours based on discrete HMMs synchronized at mora-unit transitions"Proc. International Conference on Spoken Language Processing. 3. 259-262 (2000)

Description

[Publications] Keikichi Hirose: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody 2002. 391-394 (2002)

Description

[Publications] Keikichi Hirose: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)

Description

[Publications] Nobuyuki Nishizawa: "Separation of voiced source characteristics and vocal tract transfer function characteristics for speech sounds by iterative analysis based on AR-HMM model"Proc. International Conference on Spoken Language Processing. 3. 1721-1724 (2002)

Description

[Publications] Keikichi Hirose: "Corpus-based synthesis of FO contours for emotional speech using the generation process model"Proceedings 15th International Congress of Phonetic Sciences. (発表予定). (2003)

Description

[Publications] 西澤信行: "波形編集を併用したフォルマント音声合成"電子情報通信学会技術研究報告(音声研究会). 35-42 (2001)

Description

[Publications] Nettre Benjamin: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"電子情報通信学会技術研究報告(音声研究会). 53-60 (2002)

Description

[Publications] 広瀬啓吉: "音声情報処理におけるパラ・非言語情報"日本音響学会秋季講演論文集. 1. 243-246 (2002)

Description

[Publications] Wentao Gu: "Considerations on acoustic models for HMM-based Mandarin synthesis"日本音響学会春季講演論文集. (発表予定). (2003)

Description

[Publications] Keikichi Hirose: "Yesterday and for the Spoken Language Researches (Corpus-based synthesis of fundamental frequency contours for TTS systems based on a generation Process model)"TaeHakSa. 680(17) (2002)

Description

[Publications] Keikichi Hirose and Hiromichi Kawanami: "Temporal rate change of dialogue speech in prosodic units as compared to read speech"Speech Communication. 36, 1-2. 97-111 (2002)

Description

[Publications] Keikichi Hirose: "Invitation to Speech Synthesis Research -Toward Realization of Flexible Synthesis-"Information Processing Society of Japan Magazine. 43, 3. 321-324 (2002)

Description

[Publications] Shuichi Narusawa, Nobuaki Minematsu, Keikichi Hirose and Hiroya Fujisaki: "A method for automatic extraction of parameters of the fundamental frequency contour generation model"IPSJ (Information Processing Society of Japan) Journal. 43, 7. 2155-2168 (2002)

Description

[Publications] Atsuhiro Sakurai, Keikichi Hirose and Nobuaki Minematsu: "Data-driven generation of F0 contours using a superpositional model"Speech Communication. to be published. (2003)

Description

[Publications] Nobuyuki Nishizawa, Nobuaki Minematsu and Keikichi Hirose: "Development of a formant-based analysis-synthesis system and generation of high quality liquid sounds of Japanese"Proc. International Conference on Spoken Language Processing. 1. 725-728 (2000)

Description

[Publications] Atsuhiro Sakurai, Koji Iwano and Keikichi Hirose: "Modeling and generation of accentual phrase F0 contours based on discrete HMMs synchronized at mora-unit transitions"Proc. International Conference on Spoken Language Processing. 3. 259-262 (2000)

Description

[Publications] Keikichi Hirose, Nobuaki Minematsu, and Masaya Eto: "Data-driven synthesis of fundamental frequency contours for TTS systems based on a generation process model"Proc. Speech Prosody. 2002. 391-394 (2002)

Description

[Publications] Keikichi Hirose, Masaya Eto, and Nobuaki Minematsu: "Improved corpus-based synthesis of fundamental frequency contours using generation process model"Proc. International Conference on Spoken Language Processing. 3. 2085-2088 (2002)

Description

Description

[Publications] Keikichi Hirose, Toshiya Katsura, and Nobuaki Minematsu: "Corpus-based synthesis of F0 contours for emotional speech using the generation process model"Proc. 15th International Congress of Phonetic Sciences. to be published. (2003)

Description

[Publications] Nobuyuki Nishizawa, Nobuaki Minematsu, and Keikichi Hirose: "Formant speech synthesis partly using waveform concatenative synthesis -Experimental study on VCV sounds-"IEICE Technical Report. SP2001-20. 35-42 (2001)

Description

[Publications] Nettre Benjamin, Keikichi Hirose, and Nobuaki Minematsu: "An experimental study on concatenative speech synthesis using a fusion technique and VCV/VV units"IEICE Technical Report. SP2001-121. 53-60 (2002)

Description

[Publications] Keikichi Hirose: "Para-and non-linguistic information in speech information processing"Record of Fall Meeting, Acoust. Soc. Japan. 1. 243-246 (2002)

Description

[Publications] Wentao Gu, Keikichi Hirose, Nobuaki Minematsu: "Considerations on acoustic models for HMM-based Mandarin synthesis"Record of Spring Meeting, Acoust. Soc. Japan. 1, to be published. (2003)

Description

[Publications] Keikichi Hirose, Nobuaki Minematsu and Masaya Eto: "Corpus-based synthesis of fundamental frequency contours for TTS systems based on a generation Process model"Yesterday and Today for the Spoken Language Researches, TaeHakSa. 461-477 (2002)

Description