Improvement of Speech Recognition Ratio in Noisy Environment

Research Project

Project/Area Number	60460145
Research Category	Grant-in-Aid for General Scientific Research (B)
Allocation Type	Single-year Grants
Research Field	計測・制御工学
Research Institution	Kumamoto University
Principal Investigator	JOSUKE Okuda Faculty of Engineering, Kumamoto University, Professor, 工学部, 教授 (70040342)
Co-Investigator(Kenkyū-buntansha)	UEDA Yuichi Faculty of Engineering, Kumamoto University, Assistant, 工学部, 助手 (00141961) USAGAWA Tsuyoshi Faculty of Engineering, Kumamoto University, Assistant, 工学部, 助手 (30160229) SONODA Yorinobu Faculty of Engineering, Kumamoto University, Professor, 工学部, 教授 (70037836) WATANABE Akira Faculty of Engineering, Kumamoto University, Professor, 工学部, 教授 (50040382) EBATA Masanao Faculty of Engineering, Kumamoto University, Professor, 工学部, 教授 (40005319)
Project Period (FY)	1985 – 1986
Project Status	Completed (Fiscal Year 1986)
Budget Amount *help	¥2,200,000 (Direct Cost: ¥2,200,000) Fiscal Year 1986: ¥2,200,000 (Direct Cost: ¥2,200,000)
Keywords	Man-machine-interface / Speech Recognition System / Surrounding Noise / Spacial Summation / Directional Microphone / Pharynx Microphone / PARCOR分析合成系
Research Abstract	Remarkable progress of the semiconductor technology and the speech recognition technique make us realize the speech recognition system composed of a few LSIs, even if the system can be applied to the specific person and the isolated words. However such a system does not work well in noisy environment. The influence of a surrounding noise seems to be serious on occasions when these LSIs are utilized as an instrument like a door key operated by speech, an automobile telephone, a remote controller of a television set by speech and so on. In this project, the effect of background noise is studied and the methods to improve the performance of speech recognition under noisy environment are proposed. The improvement is performed as follows; the adaptive time window gate is applied to pickup the word in the noisy input signal, and several methods are utilized to improve SN ratio such as spacial synchronous summation using a few microphones, usage of a directional microphone or a pharynx microphone, and usage of PARCOR analysis and synthesis system as noise reduction filter. As the result, the speech recognition system for specific person, which could not work when SN ratio was under 47dB, is able to work on condition that SN ratio is under 15dB for any words, and under 10dB for selected words. Also the performance of the recognition system for non-specific person is improved using proposed method. Though the improvement is not sufficient yet, the first step to realize voice communication system for actual usage is made.