Speaker-independent word recognition method in which a qroup of words can freely and easily be constructed

Research Project

Project/Area Number	06650424
Research Category	Grant-in-Aid for General Scientific Research (C)
Allocation Type	Single-year Grants
Research Field	情報通信工学
Research Institution	Kumamoto University
Principal Investigator	WATANABE Akira Kumamoto University, Dept.of Electrical Engineering & Computer Science, Professor, 工学部, 教授 (50040382)
Co-Investigator(Kenkyū-buntansha)	IKEDA Takashi Kurume National College of Technology, Assooiate Professor, 助教授 (80222884) UEDA Yuichi Kumamoto University, Dept.of Electrical Engineering & Computer Science, Associat, 工学部, 助教授 (00141961)
Project Period (FY)	1994 – 1995
Project Status	Completed (Fiscal Year 1995)
Budget Amount *help	¥2,000,000 (Direct Cost: ¥2,000,000) Fiscal Year 1995: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 1994: ¥1,400,000 (Direct Cost: ¥1,400,000)
Keywords	speaker-independent / word recognition / input parameters / statistical distance / neural network / phoneme template / 単語辞書 / 類似度距離
Research Abstract	This research aims to investigate how to achieve available speaker-independent-word-recognition method which is easily applicable to any word group. The system consists of a standard phoneme template, a word dictionary independent of the template, a phoneme-distance matrix computation and a word-judgment by DP matching. In order to improve recognition rates in this flexible scheme, new compound parameters of speech have been tested. Those parameters, that is, mel-band filter bank outputs, normalized formant correlates and neural network outputs on a manner of artioulation and voice souroes, may have complementary effects on the improvement. First of all, from practical viewpoint, the consonant template has been made by speech data of only one speaker and Euclidean distance has been used. In the recognition tests using 3 groups of 30 words uttered by 30 speakers, 95-96% of recognition rate has been achieved by the compound parameters, while it by mel-filter bank only or mel-cepstrum parameters only has been 89-90%. Next, the standard phoneme template has been collected from utterances of 20 speakers and we hane tried to test 50 words uttered by 30 speakers different from them for the template. The total distance in the compound psrameters is defined as weighted linear sum of each parameter distance. The weights have been decided to maximize phoneme recognition rates which were examined by phonemes in words uttered by 4 new speakers. The results of the word recognition tests using two kinds of distances are as follows : (1) In all combination of the parameters, the recognition rates by Bays distance are higher than by Euclidean distance. (2) In the case when all of the parameters are used, the recognition rate has been 96.8% in Bays distance metric and 94.7 in Euclidean it. Thus, it has been concluded that this proposed method is very useful.

Report

(3 results)

1995 Annual Research Report Final Research Report Summary
1994 Annual Research Report