Project/Area Number |
09680394
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報システム学(含情報図書館学)
|
Research Institution | Nagoya Institute of Technology (NIT) |
Principal Investigator |
KITAMURA Tadashi NIT,Faculty of Eng., Professor, 工学部, 教授 (60114865)
|
Co-Investigator(Kenkyū-buntansha) |
TOKUDA Keiichi NIT,Faculty of Eng., Associate Professor, 工学部, 助教授 (20217483)
|
Project Period (FY) |
1997 – 1998
|
Project Status |
Completed (Fiscal Year 1998)
|
Budget Amount *help |
¥3,300,000 (Direct Cost: ¥3,300,000)
Fiscal Year 1998: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 1997: ¥2,400,000 (Direct Cost: ¥2,400,000)
|
Keywords | speaker identification / speech recognition / facial image / lip-reading / Hidden Markov Model / bimodal / M2VTS / Tulips1 / 音声 / 静止画 / ニューラルネットワーク |
Research Abstract |
1. We proposed a new technique for person recognition using bimodal information comprising of speech and facial image. The proposed method utilizes a Hidden Markov Model(HMM) for a image sequence of lip movement of a spoken word. We studied intensity and location normalization algorithms and obtained a recognition accuracy of about 95% for a bimodal database Tulips1(12 persons, 4 digit word in English). We also proposed a new normalization algorithm and showed that it reduces the calculation amount less than the one we proposed before. 2. We also applied the proposed method to a bimodal database M2VTS bigger than Tulips1, which consists of 10 digit words of 37 persons. Furthermore, some algorithms based on HMM for normalization of facial image and tracking of lip location were studied. We carried out spoken word recognition and speaker identification experiments using only lip reading information. The experimental results have shown that an use of intensity and location normalization is very effective. We obtained a speaker identification rate of 81.0% using one word "0" and a word recognition rate of 74.2% for 10 digits for 37 persons, respectively. 3. For speaker identification using speech, we proposed a new spectral parameter estimation method which utilizes a phase characteristics of a second-order all-pass warping function. This method can change the frequency resolution of speech spectrum in an arbitrary region. Using the proposed method we carried out speaker recognition experiments based on a discriminative feature extraction (DFE), which optimizes the warping function of spectrum for speaker recognition. We carried out speaker identification experiments by the proposed method and conventional ones. Experimental results have shown that this method is more effective than conventional methods and spectrum around 2kHz is very important for speaker identification.
|