|Budget Amount *help
¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 1997: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 1996: ¥1,400,000 (Direct Cost: ¥1,400,000)
In order to elucidate the relation among the anatomical structure of the vocal tract, the acoustic parameters, and the auditory perception, we have developed a new method to measure three dimensional vocal tract shapes from magnetic resonance (MR) images of sustained vowels. The dimensions of the vocal tract shapes of five Japanese vowels phonated by three adult males, three females and a boy are measured from the MR images. Differences of vocal tract shapes and acoustic parameters among male, female and child are systematically investigated. The result showed that the non-uniformity in the dimensions of the vocal tract shape had no essential effect on the acoustic characteristics.
In this research individualities of the vocal tract shape of vowels measured from MR images of males and females were discussed. Differences in dimensions of the vocal tract of the subjects and their effects on acoustic characteristics were investigated. Perceptual similarity tests of vowel quality showed tha
t normalization of vowels from females to males could be made by relying largely on the vocal tract length.
Vowels of the males were carefully compared at the articulatory and acoustic levels. The result suggested that, for an identical vowel of males, the "invariance" of the phonation may be acoustic parameters (the first three formant frequencies F1, F2 and F3, which are important in auditory perception) rather than the articulatory simulation. It was also showed that the higher formant frequencies (F4, F5) are stable factors of speaker individuality.
To stably estimate speaker individual parameters from a speech signal, a source-filter model was introduced to represent the speech production, in which a speech signal is regarded as the output of a filter (the vocal tract) excited by a sound source.A novel speech analysis method was proposed to analyze the model parameters by using a direct subspace-based state-space system identification algorithm. Experimental results showed that not only the vocal tract parameters including the higher formant frequencies but also the source parameters can be estimated quite well.
In addition to speaker recognition, the results of this project can be expected to be applied in almost all the speech research areas, such as synthesis, perception, voice conversion, coding, recognition. Less