1989 Fiscal Year Final Research Report Summary
Word Recognition using A Two-Dimensional Mel-cepstrum under Noisy Environments.
Project/Area Number |
63550253
|
Research Category |
Grant-in-Aid for General Scientific Research (C)
|
Allocation Type | Single-year Grants |
Research Field |
電子通信系統工学
|
Research Institution | Nagoya Institute of Technology |
Principal Investigator |
KITAMURA Tadashi Faculty of Engineering, Nagoya Institute of Technology, Associate Professor, 工学部, 助教授 (60114865)
|
Project Period (FY) |
1988 – 1989
|
Keywords | noise / word recognition / two-dimensional mel-cepstrum / Japanese digit / dynamic features of spectra / 雑音下での単語音声認識 / 数字音声 |
Research Abstract |
The purpose of this research is to offer a new method for word recognition under noisy environments. In this study white noise generated by computer simulation and colored noise recorded in the Nagoya station are used. A speaker- independent word recognition method of ten Japanese digits using a two- dimensional mel-cepstrum(TDMC) is proposed. TDMC is defined as the two- dimensional Fourier transform of mel-frequency scaled logarithm spectra in the frequency and time domains and consists of average features and dynamic features of the two-dimensional mel-log spectra, Experimental results in this study are shown as follows. 1. Speech analysis-synthesis system using a TMDC and its estimation; The structure of speech analysis-synthesis system using a TMDC is proposed in order to study the size of the TDMC for synthesizing good quality speech. It is shown that the frequency of the required area of the TDMC is less than about 10Hz. 2. Reference patterns robust for the variation of signal-to-noise ratio (SNR) of input speech; In this study a single set of TDMCs of noise-added reference patterns with desired SNR is used for word recognition under noisy environments. Experimental results show that a recognition method using this reference pattern set is more effective than a usual method. 3. Distance measures for a word recognition method robust for the variation of SNR of input speech; Distance measures using a combination of dynamic and average features of the TDMC is proposed. It is shown that dynamic features are more important than average features for word recognition under noisy environment.
|