Project/Area Number |
07650506
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
計測・制御工学
|
Research Institution | Kumamoto University |
Principal Investigator |
SONODA Yorinobu Kumamoto University Faculty of Engineering Professor, 工学部, 教授 (70037836)
|
Co-Investigator(Kenkyū-buntansha) |
OGATA Kohichi Kumamoto University Graduate school of Science and Technology Associate, 自然科学研究科, 助手 (10264277)
守 啓祐 熊本大学, 工学部, 助手 (10200362)
|
Project Period (FY) |
1995 – 1996
|
Project Status |
Completed (Fiscal Year 1996)
|
Budget Amount *help |
¥2,100,000 (Direct Cost: ¥2,100,000)
Fiscal Year 1996: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 1995: ¥1,600,000 (Direct Cost: ¥1,600,000)
|
Keywords | magnetic resonance image / vocal tract configuration / speech synthesis / image processing / vocal tract simulator |
Research Abstract |
A main research of this project is investigating vocal tract configurations estimated by (1) magnetic resonance images (MRIs) and by (2) real speech signals. Especially in this term, experiments were conducted to the estimation of the vocal tract configuration from real speech signals. Developing a simulator on a computer system which is analogous to a mechanism of speech production process of human beings, the configuration of the tract was estimated by using "Analysis by Synthesis" algorithm. The simulator consists of three parts ; vocal source, vocal tract and lip radiation. Each part was combined with a hybrid system represented by frequency domain and time domain for simplicity of insertion of loss-term into the tract model. The model of vocal tract consists of 20 cylindrical tubes which are equal in length and different in cross sectional area. Five Japanese vowels were synthesized by using the vocal tract simulator where the configuration of the vocal tract was estimated by the MRIs, and their formant patterns (frequencies) were estimated. First formant frequencies of vowel /a/ and /i/ were estimated lower than those of real speech sound by about 120 Hz and 70 Hz, respectively. Relative errors were shown within 5 % in other vowels except fourth formant frequency of /i/. On sounds synthesized by the shape estimated from real speech sounds, experimental results were shown rather good approximation to spectral patterns of real sounds, and their relative errors were shown within 3 %. However, errors in first formant frequencies of /a/, /u/ and /e/ were relatively large and their values ranged in 8 - 9 %.
|