Project/Area Number |
61460131
|
Research Category |
Grant-in-Aid for General Scientific Research (B)
|
Allocation Type | Single-year Grants |
Research Field |
電子通信系統工学
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
IMAI Satoshi Tokyo Institute of Technology, 精密工学研究所, 教授 (50016763)
|
Co-Investigator(Kenkyū-buntansha) |
FURUICHI Chieko Tokyo Institute of Technology, 精密工学研究所, 助手 (90016783)
|
Project Period (FY) |
1986 – 1987
|
Project Status |
Completed (Fiscal Year 1987)
|
Budget Amount *help |
¥6,600,000 (Direct Cost: ¥6,600,000)
Fiscal Year 1987: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 1986: ¥5,800,000 (Direct Cost: ¥5,800,000)
|
Keywords | Speech processing / Recognition-synthesis system / Segmentation into phonemic unit / Mel-cepstrum / Acoustic processing / Knowledge processing / Unbiased log spectral estimator / Improved cepstral methob / Log spectrum / Distance measure / Reside signal / Pattern Matching / 合成 / セグメンテーション / トップダウン / リンク構造 |
Research Abstract |
It is shown through this research that the speech recognition-sythesis system based on the mel cepstral processing and the multi-level knowledge processing is very effective for establishing a natural human machine communication system. The acoustic processor is an important component because the success of a speech recognition system mainly depends upon the performance of the acoustic-phonetic processor. We proposed an unbiased estimator of the log spectrum for the advanced acoustic processing. The unbiased log spectral estimation technique can extract an accurate and stable spectral envelope. Using the several segmen tation parameters based on the unbiased log spectral estimate of speech signal, the segmentation of continuous Japanese speech into phonemic units can be successfully performed. The dynamic segmentation parameters obtained bt a qbtained by a quasi-derivative operation from the spectral envelope perameter are sufficiently stable for detecting phonemic boundaries. The perfor
… More
mance of the segmentation system was evaluated by processing the continuous, reading-rate speech samples uttered by 3 female and 3 male speakers. The segmentation error is 3.6%, consisting of 1.98% missed ans 1.58% extra for 1012 nominal count of phonemic units. The segmentation system is available to the reconition-synthesis system as a subsystem. We compared the ceptral or mel cepstral distance measure with the traditional LPC cdistance measure. Form the the distance measure comparison, it is clarified that mel cepstral distance is much more effective in the word recoghition than the LPC cepstral or WLR distance measure. For the purpose of realization of the rule-synthesis system for high quality speech, the lack of formulation for excitation source is a serious problem. In order to realize an intelligible and very high quality speech synthesis, an excitation signal with good properties is needed to replace the usual impulse train and M-sequence. We proposed a method of generating excitation signal with the spectral envelope and level according to the result obtained through the very short time spectral analysis. Less
|