1995 Fiscal Year Final Research Report Summary
Speaker-independent word recognition method in which a qroup of words can freely and easily be constructed
Project/Area Number |
06650424
|
Research Category |
Grant-in-Aid for General Scientific Research (C)
|
Allocation Type | Single-year Grants |
Research Field |
情報通信工学
|
Research Institution | Kumamoto University |
Principal Investigator |
WATANABE Akira Kumamoto University, Dept.of Electrical Engineering & Computer Science, Professor, 工学部, 教授 (50040382)
|
Co-Investigator(Kenkyū-buntansha) |
IKEDA Takashi Kurume National College of Technology, Assooiate Professor, 助教授 (80222884)
UEDA Yuichi Kumamoto University, Dept.of Electrical Engineering & Computer Science, Associat, 工学部, 助教授 (00141961)
|
Project Period (FY) |
1994 – 1995
|
Keywords | speaker-independent / word recognition / input parameters / statistical distance / neural network / phoneme template |
Research Abstract |
This research aims to investigate how to achieve available speaker-independent-word-recognition method which is easily applicable to any word group. The system consists of a standard phoneme template, a word dictionary independent of the template, a phoneme-distance matrix computation and a word-judgment by DP matching. In order to improve recognition rates in this flexible scheme, new compound parameters of speech have been tested. Those parameters, that is, mel-band filter bank outputs, normalized formant correlates and neural network outputs on a manner of artioulation and voice souroes, may have complementary effects on the improvement. First of all, from practical viewpoint, the consonant template has been made by speech data of only one speaker and Euclidean distance has been used. In the recognition tests using 3 groups of 30 words uttered by 30 speakers, 95-96% of recognition rate has been achieved by the compound parameters, while it by mel-filter bank only or mel-cepstrum parameters only has been 89-90%. Next, the standard phoneme template has been collected from utterances of 20 speakers and we hane tried to test 50 words uttered by 30 speakers different from them for the template. The total distance in the compound psrameters is defined as weighted linear sum of each parameter distance. The weights have been decided to maximize phoneme recognition rates which were examined by phonemes in words uttered by 4 new speakers. The results of the word recognition tests using two kinds of distances are as follows : (1) In all combination of the parameters, the recognition rates by Bays distance are higher than by Euclidean distance. (2) In the case when all of the parameters are used, the recognition rate has been 96.8% in Bays distance metric and 94.7 in Euclidean it. Thus, it has been concluded that this proposed method is very useful.
|