Project/Area Number |
14350204
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報通信工学
|
Research Institution | The University of Tokushima |
Principal Investigator |
KUROIWA Shingo The University of Tokushima, Faculty of Engineering, Associated Professor, 工学部, 助教授 (20333510)
|
Co-Investigator(Kenkyū-buntansha) |
KITA Kenji The University of Tokushima, Center for Advanced Information Technology, Professor, 高度情報化基盤センター, 教授 (10243734)
REN Fuji The University of Tokushima, Faculty of Engineering, Professor, 工学部, 教授 (20264947)
TSUGE Satoru The University of Tokushima, Faculty of Engineering, Assistant Professor, 工学部, 助手 (00325250)
|
Project Period (FY) |
2002 – 2004
|
Project Status |
Completed (Fiscal Year 2004)
|
Budget Amount *help |
¥9,200,000 (Direct Cost: ¥9,200,000)
Fiscal Year 2004: ¥2,100,000 (Direct Cost: ¥2,100,000)
Fiscal Year 2003: ¥4,000,000 (Direct Cost: ¥4,000,000)
Fiscal Year 2002: ¥3,100,000 (Direct Cost: ¥3,100,000)
|
Keywords | Speaker Recognition / Speech Database / Distributed Speech Recognition / Distributed Speaker Verification / Earth Mover's Distance / Nonparametric Modeling / Telephone- Channel Adaptation / Speech Signal Processing / ノンパラメトリック / セグメント量子化 / 判別分析 / 回線特性正規化 / 時期差 |
Research Abstract |
In this research, we focused on a Distributed Speaker Recognition Method (DSR). DSR separates the structural and computational components of recognition into two components - the front-end processing on the terminal and the speaker recognition engine on the server. The most important advantage of DSR is that it can use a high frequency component in which speaker-specific information is revealed. On the other hand, DSR has to compress the sending data to establish a lower bit rate for transmission. In order to achieve both high accuracy and low bit rate, we have developed the following techniques. 1)A Real-time bias removal method that improves the robustness against convolutional noises, which increases quantization distortion and recognition error. 2)A Nonparametric speaker recognition method that consists of a histogram-based speaker model and Earth Mover's Distance. These proposed methods have established a high accuracy equivalent to microphone input under the condition of a low bit rate, 4.8kbps, which is the bit rate recommended as the European Telecommunication Standards Institute (ETSI) Standard Distributed Speech Recognition System. We also proposed the following techniques that can be used not only in DSR but in conventional telephone networks. 3)A Phoneme-dependent speaker recognition method in which speech signals are segmented at the client and the most effective phonemes for speaker recognition are selected to be sent to the server. 4)A Rapid acoustic model adaptation technique for codec speech using speech synthesis. 5)Packet-loss concealment algorithms using MFT-based speech recognition and HMM-based speech synthesis. Furthermore, we have been recording four peoples' voices every week over two years to investigate change in voice characteristics. Now, we are exploring essential voice attributes that characterize a speaker using these data.
|