Project/Area Number |
08680386
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Utsunomiya University |
Principal Investigator |
KASUYA Hideki Utsunomiya University Faculty of Engineering, Professor, 工学部, 教授 (20006240)
|
Co-Investigator(Kenkyū-buntansha) |
YANG Chang-Sheng Utsunomiya University Faculty of Engineering, Assistant, 工学部, 助手 (80272219)
|
Project Period (FY) |
1996 – 1997
|
Project Status |
Completed (Fiscal Year 1997)
|
Budget Amount *help |
¥2,600,000 (Direct Cost: ¥2,600,000)
Fiscal Year 1997: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 1996: ¥1,900,000 (Direct Cost: ¥1,900,000)
|
Keywords | Speech Synthesis / Voice Quality / Individuality / ARX Analysis / Formant / Voice Source Characteristics / Hoarse Voice / Whisper / フォルマント合成 |
Research Abstract |
Flexible voice quality control in speech synthesis includes not only that of such qualities as whisper, breathy and tense but also that of talker individuality resulting from physiological differences in the speech organ. Major aim of this research project is to establish a base to realize such control in speech synthesis. In this year we have paid much attention to synthetic strategy to generate speech of whisper, breathy, harsh and tense quality as well as various talker individualities, using ARX (auto-regressive with exogenous input) speech analysis-synthesis method that was developed last year. As for whisper voice, we have investigated acoustic mechanism to interpret the formant structure specific to whisper voice and found new theory to explain frequency shift of lower formants based on MRI (magnetic resonance imaging) measurements of the larynx and computer simulation of acoustic resonance of the vocal tract. In order to produce breathy voice, we have proposed a method to control voicing source parameters and amount of laryngeal noise. Regarding harsh voice, we have first developed a sophisticated analysis-conversion-synthesis system that allows us to manipulate characteristics of jitter, shimmer, spectral fluctuation and laryngeal noise and then studied contributions of these parameters to the perception of harsh voice. From the experiments we have found that cross effects exist among these parameters to generate harsh voice quality. Tense voice has been successfully generated by controlling open quotient and spectral tilt of a voicing source waveform. Talker individuality has been found largely related to the static nature of formant trajectories and less to the dynamics.
|