Project/Area Number |
12480079
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
HIROSE Keikichi Graduate School of Frontier Sciences, Professor, 大学院・新領域創成科学研究科, 教授 (50111472)
|
Co-Investigator(Kenkyū-buntansha) |
MINEMATSU Nobuaki Graduate School of Information Science and Technology, Associate Professor, 大学院・情報理工学系研究科, 助教授 (90273333)
|
Project Period (FY) |
2000 – 2002
|
Project Status |
Completed (Fiscal Year 2002)
|
Budget Amount *help |
¥10,000,000 (Direct Cost: ¥10,000,000)
Fiscal Year 2002: ¥2,800,000 (Direct Cost: ¥2,800,000)
Fiscal Year 2001: ¥4,400,000 (Direct Cost: ¥4,400,000)
Fiscal Year 2000: ¥2,800,000 (Direct Cost: ¥2,800,000)
|
Keywords | Statistical Speech Synthesis / Terminal Analogue Synthesis / Waveform Concatanative Synthesis / HMM Speech Syntheses / AR-HMM Model / Fundamental Frequency Contour / Generation Process Model / Emotional Speech Synthesis / 波形編集合成 / 声帯音源波形モデル / フォルマント推定 / 統計的音声合成手法 / 分節的特徴 / 韻律的特徴 / 対話音声 / ARX分析 / モーラ持続時間 |
Research Abstract |
The original research plan, which aims at realizing high-quality speech synthesis through utilizing accurate pole-zero information of vocal transfer function for segmental feature generation and applying the functional model constraints for prosodic feature generation, was accomplished with the following results : 1. A successive approximation was applied to ARX analysis enabling accurate pole-zero estimation. The method was combined with our formerly developed terminal analogue synthesizer to construct a analysis-synthesis workbench. Using this, we succeeded to improve the quality of liquid sound. 2. A speech synthesizer, hybrid of terminal analogue and waveform concatenation, was developed. A high-quality speech synthesis was realized. 3. A method was developed for stable formant extraction, which was based on AR-HMM modeling, representing source waveform using HMM. Result of speech synthesis experiment showed that the method could generate high-quality even for a large F0 (fundamental
… More
frequency) change. 4. By adding natural waveform of junction periods in the spectral domain with appropriate weighting to the concatenated speech, we successfully realized a smooth spectral transition. Also we developed a method to effectively reduce the corpus size for concatenative synthesis by the weighted VQ according to the frequency. 5. The necessary data size for speaker adaptation was investigated form the viewpoint of speech quality after developing a HMM speech synthesizer. It was shown that a good quality was obtainable 10 and more sentences. 6. F0 contour generation was realized by estimating the generation process model parameters using statistical methods. A high speech quality was realized only from a small speech corpus by using linguistic information such as on direct modification relations of words. Also we succeeded to estimate the accent phrase boundaries form text using the same statistical framework. Furthermore, F0 contour generation and phoneme length estimation were realized for emotional speech with a good result. 7. A method for automatically estimating F0 contour generation process model commands was realized. Using the method, a prosodic corpus was made. This corpus is indispensable for the above F0 contour generation. 8. A rule for controlling mora duration for dialogue-like speech synthesis was constructed. The result of the speech synthesis experiment showed the validity of the rule. Less
|