1999 Fiscal Year Final Research Report Summary
Naturally Sounding Speech Synthesis and Recognition Based on the Formulation of Prosody
Project/Area Number |
09480061
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
HIROSE Keikichi Dept. of Frontier Informatics, Univ. of Tokyo, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)
|
Co-Investigator(Kenkyū-buntansha) |
INEMATSU Nobuaki Dept. of Inf. & Computer Science, Toyohashi Univ. of Tech., Assistant, 工学部, 助手 (90273333)
|
Project Period (FY) |
1997 – 1999
|
Keywords | Prosodic Features / Speech Synthesis / Speech Recognition / Dialogue-style Speech / Emotional Speech / Statistic Model of Moraic Transition / Prosodic Word Boundary / Dynamic Pruning |
Research Abstract |
Several results including the following ones were achieved through the study aiming at formulating the relationship between prosodic features of speech and linguistic and para/non linguistic information, and realizing advanced technologies on speech synthesis : 1. An improved accuracy was realized in automatic extraction of phrase component onsets from fundamental frequency (FO) contours by suppressing accent components through low-pass filtering of the contours and by taking their deviations. Further improvements in accuracy were realized in a method of automatic prosodic labeling where knowledge on prosody obtainable from linguistic information was utilized as constrictions F0 parameter estimation. 2. Mora duration rules were constructed for dialogue-like speech synthesis. These rules are basically to modify each mora duration of reading-style speech to that of dialogue-like speech in prosodic phrase-basis, defined by the FO contours. 3. Prosodic features of Speech with various attitude
… More
s/emotions were analyzed. It was found that a speaker selectively controlling several prosodic cues to express degree of attitudes/emotion. It was also found through a perceptual experiment that segmental feature control were also indispensable to realized emotional speech. 4. A method was developed to represent F0 contours of prosodic words by codes in mora unit and to model their transitions statistically (Statistic model of moraic transition). The detection rates of 70-75% were achieved with insertion errors of 11-15% for prosodic word boundaries. The method was applied to continuous speech recognition with few % improvements in mora recognition rates. A method was also developed to generate sentence F0 contours with inputs of accent types and phrase boundary positions. 5. A prosodic feature-based method was developed for the dynamic pruning in beam search process of large-vocabulary continuous speech recognition. It was proved that the search space could be reduced to a quarter without degradation in recognition rates. The method enlarges beam width at prosodic boundaries and decreases between boundaries. A method was also developed to select phoneme models with various context dependencies using prosodic boundary information. 6. Based on the results obtained, a spoken dialogue system of academic information retrieval was developed and evaluated. Less
|