Co-Investigator(Kenkyū-buntansha) |
OHKUBO Masaki College of Biomedical Technology, Niigata University, Research Associate, 医療技術短期大学部, 助手 (10203738)
KIRYU Tohru Faculty of Engineering, Niigata University, Assistant Professor, 工学部, 助教授 (80115021)
|
Budget Amount *help |
¥1,900,000 (Direct Cost: ¥1,900,000)
Fiscal Year 1991: ¥500,000 (Direct Cost: ¥500,000)
Fiscal Year 1990: ¥1,400,000 (Direct Cost: ¥1,400,000)
|
Research Abstract |
Transient part of speech signals has been of great interest these years in speech recognition. Furui proposed delta cepstrum coefficients to extract a dynamic characteristic of speech signals and achieved higher recognition rate than ever [1986]. It should be noted, however, that conventional approaches including the dynamic characteristics depend on a linear prediction that features formats of speech signals. We employed a structural model named the natural observation system for speech modeling and a geometrical interpretation to classify the types of transient parts. A natural observation filter (NOF) consists of a series of first order filters with the same cutoff frequency : a low pass filter of a first stage followed by high pass filters. The NOF can reconstruct an input signal with a linear combination of outputs at each stage of first order filters, employing appropriate coefficients. We called them natural observation coefficients (NOCs). We can analyze signals by the time courses of NOCs with multi-cutoff frequencies (NOC pattern), because each NOC is a function of the cutoff frequency. We applied the NOC pattern for Japanese consonants, fricatives and stops, to study the transient parts of speech signals. Consequently, we obtained distinguishable NOC patterns that probably reflect an instantaneous structure of Japanese consonants, fricatives and stops, and following vowels. On the other hand, we used a geometrical interpretation of the transient parts. A two-dimensional plane in the p-dimensional Hilbert space was determined so that the trajectory of coefficients vector becomes as straight as possible. We use three types of parameters, including linear prediction coefficients, LPC cepstrum coefficients, and NOCs which are structured parameters. As a result, NOCs show a rather straight line than the other two. Limiting to LPC cepstrum coefficients, the smoothly progressive trajectory to a following vowel contributed to the high recognition rate.
|