Continuous speech recognition with adaptabilty to the speaking rate of an input speech
Project/Area Number |
07458064
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Tohoku University |
Principal Investigator |
MAKINO Shozo Tohoku Univ., Computer Center, Prof., 大型計算機センター, 教授 (00089806)
|
Co-Investigator(Kenkyū-buntansha) |
SUZUKI Motoyuki Tohoku Univ., Computer Center, Research Associ., 大型計算機センター, 助手 (30282015)
SONE Hideaki Tohoku Univ.Graduate School of Information Sceiences Assosci.Prof., 情報科学研究科, 助教授 (40134019)
伊藤 彰則 山形大学, 工学部, 講師 (70232428)
安倍 正人 東北大学, 大型計算機センター, 助教授 (00159443)
|
Project Period (FY) |
1995 – 1997
|
Project Status |
Completed (Fiscal Year 1997)
|
Budget Amount *help |
¥6,400,000 (Direct Cost: ¥6,400,000)
Fiscal Year 1997: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 1996: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 1995: ¥4,800,000 (Direct Cost: ¥4,800,000)
|
Keywords | continuous speech recognition / phoneme recognition / speaking rate / speakaer adaptation / 発声速度 / 持続時間 / 予備認識 |
Research Abstract |
This tesearch developed a spoken word recognition system which used phoneme duration information estimated from the speaking rate of an input speech. In this research, the speaking rate is assumed to be reflected to the average vowel length. The acoustic processor transforms the input speech into a similarity matrix using the modified LVQ2. The average vowel length is computed from the preliminary recognition result. The duration of each phoneme in each word template is estimated from the average length of vowels in the input speech. By taking into account the estimated phoneme duration, the spoken word recognition experiments were carried out using the DTW.The word recognition score was 97.3% for the 212 word vocabulary uttered by 5 male speakers (test set). The phoneme duration information is collected from the 212 word vocabulary uttered by another 5 male and 10 female speakers (training set). The hybrid combination of the prceiding phoneme dependent estimation and the follwoing phoneme dependent estimation gave the best performance. The above-mentioned method was extended to phoneme recognition. The phoneme accuracy increased from 71.8% to 86.3% for phonemes in the 212 word vocabulary uttered by 5 male speakers (test set).
|
Report
(4 results)
Research Products
(22 results)