1999 Fiscal Year Final Research Report Summary

Naturally Sounding Speech Synthesis and Recognition Based on the Formulation of Prosody

Research Project

Project/Area Number	09480061
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	HIROSE Keikichi Dept. of Frontier Informatics, Univ. of Tokyo, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)
Co-Investigator(Kenkyū-buntansha)	INEMATSU Nobuaki Dept. of Inf. & Computer Science, Toyohashi Univ. of Tech., Assistant, 工学部, 助手 (90273333)
Project Period (FY)	1997 – 1999
Keywords	Prosodic Features / Speech Synthesis / Speech Recognition / Dialogue-style Speech / Emotional Speech / Statistic Model of Moraic Transition / Prosodic Word Boundary / Dynamic Pruning
Research Abstract	Several results including the following ones were achieved through the study aiming at formulating the relationship between prosodic features of speech and linguistic and para/non linguistic information, and realizing advanced technologies on speech synthesis : 1. An improved accuracy was realized in automatic extraction of phrase component onsets from fundamental frequency (FO) contours by suppressing accent components through low-pass filtering of the contours and by taking their deviations. Further improvements in accuracy were realized in a method of automatic prosodic labeling where knowledge on prosody obtainable from linguistic information was utilized as constrictions F0 parameter estimation. 2. Mora duration rules were constructed for dialogue-like speech synthesis. These rules are basically to modify each mora duration of reading-style speech to that of dialogue-like speech in prosodic phrase-basis, defined by the FO contours. 3. Prosodic features of Speech with various attitude … More s/emotions were analyzed. It was found that a speaker selectively controlling several prosodic cues to express degree of attitudes/emotion. It was also found through a perceptual experiment that segmental feature control were also indispensable to realized emotional speech. 4. A method was developed to represent F0 contours of prosodic words by codes in mora unit and to model their transitions statistically (Statistic model of moraic transition). The detection rates of 70-75% were achieved with insertion errors of 11-15% for prosodic word boundaries. The method was applied to continuous speech recognition with few % improvements in mora recognition rates. A method was also developed to generate sentence F0 contours with inputs of accent types and phrase boundary positions. 5. A prosodic feature-based method was developed for the dynamic pruning in beam search process of large-vocabulary continuous speech recognition. It was proved that the search space could be reduced to a quarter without degradation in recognition rates. The method enlarges beam width at prosodic boundaries and decreases between boundaries. A method was also developed to select phoneme models with various context dependencies using prosodic boundary information. 6. Based on the results obtained, a spoken dialogue system of academic information retrieval was developed and evaluated. Less

Research Products
(48 results)

All Other

All Publications (48 results)

[Publications] 広瀬啓吉: "Analysis of intonation in emotional speech"Proc,ESCA Tutorial and Research Workshop on Intonation:Theory,Models and Applications. 185-188 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 倪晋富: "Quantitative analysis and formulation of tone concatenation in Chinese F_0 contours"Proc.European Conf,on Speech Communication and Technology. 1. 195-198 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "A method of representing fundamental ferquency contours of Japanese using statistical models of moraic transition"Proc.European Conf,on Speech Communication and Technology. 1. 311-314 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Accent type recognition and syntactic boundary detection of Japanese using statistical modeling of moraic transitions of fundamental frequency contours"Proc.IEEE International Conf.on Acoustics,Speech,& Signal Processing. 1. 25-28 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "韻律情報の処理"信号処理. 2・6. 415-423 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "On the relationship of speech rates with prosodic units in dialogue speech"Proc.International Conf.on Spoken Language Processing. 5. 1979-1982 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 桜井淳宏: "A linguistic and prosodic database for data-driven Japanese TTS synthesis"Proc.International Conf.on Spoken Language Processing. 7. 2843-2846 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Detection of phrase boundary changes by comparing observed and modelgenerated fundamental frequency contours"Journal for the Integrated Stydy of Artificial Intelligence Congnitive Science and Applied Epistemology. 15・3. 235-253 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 峯松信明: "FO変化に伴うスペクトル変動に対する分析とモデル化"日本音響学会誌. 55・3. 165-174 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 岩野公司: "Prosodic word boundary detection using statistical of moraic fundamental frequency contours and its use for continuous speech recognition"Proc.IEEE International Conf.on Acoustics,Speech,& Signal Processing. 1. 133-136 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 岩野公司: "モーラを単位とした基本周波数パターンの確立モデル化とそれによるアクセント句境界の検出"情報処理学会論文誌. 40・4. 1356-1364 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 桜井淳宏: "Designing a parameter-based prosodic speech database"Proc.Oriental COCOSDA Workshop. 5-8 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Statistical modeling of prosodic features and its use for speech recognition"Proc. International Conf. on Speech Proceeding. 1. 43-52 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Generation of speech reply in a spoken dialogue system for literature retrieval"Proc.ESCA TR Workshop on Interactive Dialogue in Multi-Modal Systems. 29-32 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 川波弘道: "Speech rate control for dialogue speech synthesis based on the prosodic structures"Proc.ESCA TR Workshop on Dialogue and Prosody. 59-64 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Tone recognition of Chinese continuous speech using tone critical segments"Proc.European Conf,on Speech Communication and Technology. 2. 879-882 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 桜井淳宏: "Detecting accent sandhi in Japanese using a superpositional F_0 model"Proc.European Conf,on Speech Communication and Technology. 4. 1863-1866 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 峯松信明: "HMMを用いた英単語音声からの強勢音節の自動検出とそれの基づく発音能力の韻律的評定"電子情報通信学会論文誌. 82-d-II.11. 1865-1876 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 李時旭: "Dynamic beam search strategy using prosodic-syntactic information"Proc.IEEE Workshop on Automatic Speech Recognition and Understanding. 189-192 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 桜井淳宏: "モーラ遷移HMMに基づくF0パターンのモデル化と生成"日本音響学会講演論文集. I(発表予定). (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition"Proc.IEEE.International Conf.on Acoustics,Speech,& Signal Processing. (発表予定). (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 張勁松: "Anchoring hypothesis and its application to tone recognition of Chinese continuous speech"Proc.IEEE.International Conf.on Acoustics,Speech,& Signal Processing. (発表予定). (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "Disambiguating recognition rsults by prosodic features (「Computing Prosodyの」IV-21)"Springer-Verlag 社. 401 16 (1997)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 広瀬啓吉: "対話音声の生成(「音声による人間と機械の対話」の第4章)"オーム社. 375 14 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Keikichi Hirose, Hiromichi Kawanami and Nobuyuki Ihara: "Analysis of intonation in emotional speech"Pro. ESCA Tutorial and Research Workshop on Intonation : Theory, Models and Applications. 185-188 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Jin-Fu Ni, Ren-Hua Wang and Keikichi Hirose: "Quantitative analysis and formulation of tone concatenation in Chinese FィイD20ィエD2 contours"Proc. European Conference on Speech Communication and Technology. 1. 195-198 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Kouji Iwano: "A method of representing fundamental frequency contours of Japanese using statistical models of moraic transition"Proc. European Conference on Speech Communication and Technology. 1. 311-314 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Kouji Iwano: "Accent type recognition and syntactic boundary detection of Japanese using statistical modeling of moraic transitions of fundamental frequency contours"Proc. IEEE International Conference on Acoustics, Speech, & Signal Processing. 1. 25-28 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi, Hirose: "Processing of prosodic information"Journal of Signal Processing. 2,6. 415-423 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Hiromichi Kawanami: "On the relationship of speech rates with prosodic units in dialogue speech"Proc. International Conference on Spoken Language Processing. 5. 1979-1982 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsuhiro Sakaurai, Takashi Natsume and Keikichi Hirose: "A linguistic and prosodic database for data-driven Japanese TTS synthesis"Proc. International Conference on Spoken Language Processing. 7. 2843-2846 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Atsuhiro Sakurai: "Detection of phrase boundary changes by comparing observed and model-generated fundamental frequency contours"J. for the Integrated Study of Artificial Intelligence Cognitive Science and Applied Epistemology. 15, 3. 235-253 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Nobuaki Minematsu and Seiichi Nakagawa: "Analysis and modeling of spectral variations caused by F0 changes"J. Acoustical Society of Japan. 55, 3. 165-174 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Koji Iwano and Keikichi Hirose: "Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition"Proc. IEEE International Conference on Acoustics, Speech, & Signal Processing. 1. 133-136 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Koji Iwano and Keikichi Hirose: "A statistical modeling of fundamental frequency contours in moraic unit and its use for the detection of prosodic word boundaries"Trans. Information Processing Society of Japan. 40, 4. 1356-1364 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsuhiro Sakurai and Keikichi Hirose: "Designing a parameter-based prosodic speech database"Proc. Oriental COCOSDA Workshop. 5-8 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose: "Statistical modeling of prosodic features and its use for speech recognition"Proc. International Conference on Speech Processing. 1. 43-52 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Shinya Kiriyama: "Generation of speech reply in a spoken dialogue system for literature retrieval"Proc. ESCA Tutorial and Research Workshop on Interactive Dialogue in Multi-Modal Systems. 29-32 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiromichi Kawanami and Keikichi Hirose: "Speech rate control for dialogue speech synthesis based on the prosodic structures"Proc. ESCA Tutorial and Research Workshop on Dialogue and Prosody. 59-64 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Jinsong Zhang: "Tone recognition of Chinese continuous speech using tone critical segments"Proc. European Conference on Speech Communication and Technology. 2. 879-882 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsuhiro Sakurai, Hiromichi Kawanami and Keikichi Hirose: "Detecting accent sandhi in Japanese using a superpositional FィイD20ィエD2 model"Proc. European Conference on Speech Communication and Technology. 4. 1863-1866 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Nobuaki Minematsu, Yukiko Fujisawa and Seiichi Nakagawa: "Automatic detection of stressed syllables in English words using HMMs and its application to prosodic evaluation of pronunciation Proficiency"Trans. Institute of Electronics, Information and Communication Engineers. J82-D-II, 11. 1865-1876 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Shi-wook Lee and Keikichi Hirose: "Dynamic beam search strategy using prosodic-syntactic information"Proc. IEEE Workshop on Automatic Speech Recognition and Understanding. 189-192 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Atsuhiro Sakurai, Koji Iwano and Keikichi Hirose: "F0 contour modeling and generation based on mora-transition HMM"Record of Spring Meeting, Acoust. Soc. Japan. I (to be published). (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose and Koji Iwano: "Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition"Proc. IEEE International Conference on Acoustics, Speech, & Signal Processing. (to be published). (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Jin-song Zhang: "Anchoring hypothesis and its application to tone recognition of Chinese continuous speech"Proc. IEEE International Conference on Acoustics, Speech, & Signal Processing. (to be published). (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose: "Disambiguating recognition results by prosodic features Computing Prosody"Springer-Verlag. 327-342 (1997)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Keikichi Hirose: "Generation of dialogue speech Spoken Dialogue between Man and Machine Chapter 4"Ohm. 67-80 (1998)
- Description
  「研究成果報告書概要(欧文)」より

1999 Fiscal Year Final Research Report Summary

Naturally Sounding Speech Synthesis and Recognition Based on the Formulation of Prosody

Principal Investigator

HIROSE Keikichi Dept. of Frontier Informatics, Univ. of Tokyo, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)

Research Products

[Publications] 広瀬啓吉: "Analysis of intonation in emotional speech"Proc,ESCA Tutorial and Research Workshop on Intonation:Theory,Models and Applications. 185-188 (1997)

Description

[Publications] 倪晋富: "Quantitative analysis and formulation of tone concatenation in Chinese F_0 contours"Proc.European Conf,on Speech Communication and Technology. 1. 195-198 (1997)

Description

[Publications] 広瀬啓吉: "A method of representing fundamental ferquency contours of Japanese using statistical models of moraic transition"Proc.European Conf,on Speech Communication and Technology. 1. 311-314 (1997)

Description

[Publications] 広瀬啓吉: "Accent type recognition and syntactic boundary detection of Japanese using statistical modeling of moraic transitions of fundamental frequency contours"Proc.IEEE International Conf.on Acoustics,Speech,& Signal Processing. 1. 25-28 (1998)

Description

[Publications] 広瀬啓吉: "韻律情報の処理"信号処理. 2・6. 415-423 (1998)

Description

[Publications] 広瀬啓吉: "On the relationship of speech rates with prosodic units in dialogue speech"Proc.International Conf.on Spoken Language Processing. 5. 1979-1982 (1998)

Description

[Publications] 桜井淳宏: "A linguistic and prosodic database for data-driven Japanese TTS synthesis"Proc.International Conf.on Spoken Language Processing. 7. 2843-2846 (1998)

Description

[Publications] 広瀬啓吉: "Detection of phrase boundary changes by comparing observed and modelgenerated fundamental frequency contours"Journal for the Integrated Stydy of Artificial Intelligence Congnitive Science and Applied Epistemology. 15・3. 235-253 (1998)

Description

[Publications] 峯松信明: "FO変化に伴うスペクトル変動に対する分析とモデル化"日本音響学会誌. 55・3. 165-174 (1999)

Description

[Publications] 岩野公司: "Prosodic word boundary detection using statistical of moraic fundamental frequency contours and its use for continuous speech recognition"Proc.IEEE International Conf.on Acoustics,Speech,& Signal Processing. 1. 133-136 (1999)

Description

[Publications] 岩野公司: "モーラを単位とした基本周波数パターンの確立モデル化とそれによるアクセント句境界の検出"情報処理学会論文誌. 40・4. 1356-1364 (1999)

Description

[Publications] 桜井淳宏: "Designing a parameter-based prosodic speech database"Proc.Oriental COCOSDA Workshop. 5-8 (1999)

Description

[Publications] 広瀬啓吉: "Statistical modeling of prosodic features and its use for speech recognition"Proc. International Conf. on Speech Proceeding. 1. 43-52 (1999)

Description

[Publications] 広瀬啓吉: "Generation of speech reply in a spoken dialogue system for literature retrieval"Proc.ESCA TR Workshop on Interactive Dialogue in Multi-Modal Systems. 29-32 (1999)

Description

[Publications] 川波弘道: "Speech rate control for dialogue speech synthesis based on the prosodic structures"Proc.ESCA TR Workshop on Dialogue and Prosody. 59-64 (1999)

Description

[Publications] 広瀬啓吉: "Tone recognition of Chinese continuous speech using tone critical segments"Proc.European Conf,on Speech Communication and Technology. 2. 879-882 (1999)

Description

[Publications] 桜井淳宏: "Detecting accent sandhi in Japanese using a superpositional F_0 model"Proc.European Conf,on Speech Communication and Technology. 4. 1863-1866 (1999)

Description

[Publications] 峯松信明: "HMMを用いた英単語音声からの強勢音節の自動検出とそれの基づく発音能力の韻律的評定"電子情報通信学会論文誌. 82-d-II.11. 1865-1876 (1999)

Description

[Publications] 李時旭: "Dynamic beam search strategy using prosodic-syntactic information"Proc.IEEE Workshop on Automatic Speech Recognition and Understanding. 189-192 (1999)

Description

[Publications] 桜井淳宏: "モーラ遷移HMMに基づくF0パターンのモデル化と生成"日本音響学会講演論文集. I(発表予定). (2000)

Description

[Publications] 広瀬啓吉: "Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition"Proc.IEEE.International Conf.on Acoustics,Speech,& Signal Processing. (発表予定). (2000)

Description

[Publications] 張勁松: "Anchoring hypothesis and its application to tone recognition of Chinese continuous speech"Proc.IEEE.International Conf.on Acoustics,Speech,& Signal Processing. (発表予定). (2000)

Description

[Publications] 広瀬啓吉: "Disambiguating recognition rsults by prosodic features (「Computing Prosodyの」IV-21)"Springer-Verlag 社. 401 16 (1997)

Description

[Publications] 広瀬啓吉: "対話音声の生成(「音声による人間と機械の対話」の第4章)"オーム社. 375 14 (1998)

Description

[Publications] Keikichi Hirose, Hiromichi Kawanami and Nobuyuki Ihara: "Analysis of intonation in emotional speech"Pro. ESCA Tutorial and Research Workshop on Intonation : Theory, Models and Applications. 185-188 (1997)

Description

[Publications] Jin-Fu Ni, Ren-Hua Wang and Keikichi Hirose: "Quantitative analysis and formulation of tone concatenation in Chinese FィイD20ィエD2 contours"Proc. European Conference on Speech Communication and Technology. 1. 195-198 (1997)

Description

[Publications] Keikichi Hirose and Kouji Iwano: "A method of representing fundamental frequency contours of Japanese using statistical models of moraic transition"Proc. European Conference on Speech Communication and Technology. 1. 311-314 (1997)

Description

[Publications] Keikichi Hirose and Kouji Iwano: "Accent type recognition and syntactic boundary detection of Japanese using statistical modeling of moraic transitions of fundamental frequency contours"Proc. IEEE International Conference on Acoustics, Speech, & Signal Processing. 1. 25-28 (1998)

Description

[Publications] Keikichi, Hirose: "Processing of prosodic information"Journal of Signal Processing. 2,6. 415-423 (1998)

Description

[Publications] Keikichi Hirose and Hiromichi Kawanami: "On the relationship of speech rates with prosodic units in dialogue speech"Proc. International Conference on Spoken Language Processing. 5. 1979-1982 (1998)

Description

[Publications] Atsuhiro Sakaurai, Takashi Natsume and Keikichi Hirose: "A linguistic and prosodic database for data-driven Japanese TTS synthesis"Proc. International Conference on Spoken Language Processing. 7. 2843-2846 (1998)

Description

[Publications] Keikichi Hirose and Atsuhiro Sakurai: "Detection of phrase boundary changes by comparing observed and model-generated fundamental frequency contours"J. for the Integrated Study of Artificial Intelligence Cognitive Science and Applied Epistemology. 15, 3. 235-253 (1998)

Description

[Publications] Nobuaki Minematsu and Seiichi Nakagawa: "Analysis and modeling of spectral variations caused by F0 changes"J. Acoustical Society of Japan. 55, 3. 165-174 (1999)

Description

[Publications] Koji Iwano and Keikichi Hirose: "Prosodic word boundary detection using statistical modeling of moraic fundamental frequency contours and its use for continuous speech recognition"Proc. IEEE International Conference on Acoustics, Speech, & Signal Processing. 1. 133-136 (1999)

Description

[Publications] Koji Iwano and Keikichi Hirose: "A statistical modeling of fundamental frequency contours in moraic unit and its use for the detection of prosodic word boundaries"Trans. Information Processing Society of Japan. 40, 4. 1356-1364 (1999)

Description

[Publications] Atsuhiro Sakurai and Keikichi Hirose: "Designing a parameter-based prosodic speech database"Proc. Oriental COCOSDA Workshop. 5-8 (1999)

Description

[Publications] Keikichi Hirose: "Statistical modeling of prosodic features and its use for speech recognition"Proc. International Conference on Speech Processing. 1. 43-52 (1999)

Description

[Publications] Keikichi Hirose and Shinya Kiriyama: "Generation of speech reply in a spoken dialogue system for literature retrieval"Proc. ESCA Tutorial and Research Workshop on Interactive Dialogue in Multi-Modal Systems. 29-32 (1999)