2006 Fiscal Year Final Research Report Summary

Study on elderly speech recognition for achieving speech interfaces in ubiquitous computing environment

Research Project

Project/Area Number	17560345
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Communication/Network engineering
Research Institution	Kyushu Institute of Technology
Principal Investigator	NIYADA Katsuyuki Kyushu Inst. of Tech., Faculty of Eng., Professor, 工学部, 教授 (50363396)
Co-Investigator(Kenkyū-buntansha)	MIZUMACHI Mitsunori Kyushu Inst. of Tech., Faculty of Eng., Research Associate, 工学部, 助手 (90380740)
Project Period (FY)	2005 – 2006
Keywords	elderly speech / hoarse voice / briskness of voice / speech recognition / speech interface
Research Abstract	Speech recognition technology is attractive as one of friendly human-machine interfaces for elderly people. However, elderly speech causes a great decrease in the performance of speech recognition compared to non-elderly adult speech. This study aims at improving the recognition rate of elderly speech, and carries out acoustic analysis to examine the acoustic characteristic of elderly speech quantitatively. This study focused on "non-briskness" and "hoarseness" as the nature of elderly speech, and successfully found the relationship between subjective characteristics given by listening tests and objective features obtained by acoustic analysis. Concerning the non-brisk elderly voice, it is well known that subjective non-briskness is caused by the vague movements of the articulatory organs due to aging. In this study, we found that the degree of subjective non-briskness related to the temporal movement of spectral envelops between succeeding phonemes. Furthermore, time evolution of speech power has the relationship with the subjective non-briskness as well as the temporal movement of spectral envelopes. Concerning the hoarse elderly voice, we consider that noise occurred at the aged vocal cords impresses us the subjective hoarseness. We compared the averaged amplitude spectra between elderly hoarse voices and normal voices uttered by non-elderly adults for each of Japanese vowels, and found that elderly hoarse voice had power decrease and increase in the mid frequency range (1.5 kHz-2.5 kHz) and the high frequency range over 2.5 kHz, respectively. We explain this phenomenon briefly by using tilt of amplitude spectrum. Speech recognition was carried out with the preprocessor, which adjusts the spectral tilt of elderly voice to that of non-elderly adult, and we confirmed the preprocessor worked well on the vowel recognition task. Therefore, we conclude that the hoarseness is one of the reasons for the worse recognition rate of elderly speech.