1998 Fiscal Year Final Research Report Summary
Model and example based prosodic feature extraction and its efficient integration for speech recognition along with phoneme-based recognition
Project/Area Number |
08680391
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Japan Advanced Institute of Science and Technology, Hokuriku |
Principal Investigator |
SHIMODAIRA Hiroshi JAIST,School of Information Science and Associate Professor, 情報科学研究科, 助教授 (30206239)
|
Co-Investigator(Kenkyū-buntansha) |
NAKAI Mitsuru JAIST,School of Information Science and Associate, 情報科学研究科, 助手 (60283149)
|
Project Period (FY) |
1996 – 1998
|
Keywords | prosody / prosodic-boundary / pitch pattern / speech recognition |
Research Abstract |
The aim of this research is to exploit the prosodic information contained in speech for automatic speech recognition, where the prosodic information as well as phonemic information plays an important role for speech recognition. (a) Robust pitch determination algorithm : In contrast to the conventional pitch trackers based on numerical curve-fitting, the proposed method employs a quantitative pitch generation model, which is often used for synthesizing F_0 contour from prosodic event commands for estimating continuous F0 pattern. An inverse filtering technique is employed for obtaining the initial candidates of the prosodic commands. In order to find the optimal command sequence from the commands efficiently, a beam-search algorithm and an N-best technique are employed. Preliminary experiments for a male speaker of the ATR B-set database showed promising results both in quality of the restored pattern and estimation of the prosodic events. Along with the improvement of F_0 smoothing technique above, a novel approach of frame-wise pitch determination algorithm which gives reliability of pitch frequency, was proposed as well. (b) Prosodically guided speech recognition : i. As a first step toward speech recognition based on prosodic information, isolated word recognition task under noisy environment was employed. Experiments showed that word pitch pattern helps reducing the ambiguity in discriminating similar words. ii. It was shown that the dependencies between consecutive phrases can be measured by means of prosodic features, where 87 % accuracy rate was obtained for the ATR read speech data. iii. A prototype of prosodically guided speech recognition system was developed, where phrase hypotheses given by phoneme recognition are rescored on the basis of likelihood of phrase boundaries measured by prosodic features.
|