2004 Fiscal Year Final Research Report Summary
SPEECH RECOGNITION WITH SYNCHRONOUS INPUT OF HAND-WRITTEN GESTURES FOR MOBILE DEVICES
Project/Area Number |
15300054
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
SHINODA Koichi Tokyo Institute of Technology, Graduate School of Information science and engineering, Associate Professor, 情報理工学研究科, 助教授 (10343097)
|
Co-Investigator(Kenkyū-buntansha) |
FURUI Sadaoki Tokyo Institute of Technology, Information science and engineering, Professor, 情報理工学研究科, 教授 (90293076)
|
Project Period (FY) |
2003 – 2004
|
Keywords | Speech Recognition / Multimodal Interface / Hand-written Character Recognition / Search Algorithm / Man-Machine Interface / Mobile Device / Hiddne Markov MOdel |
Research Abstract |
Mobile devices have recently been often used in daily life. User-friendly interface with high accuracy has been strongly demanded. For this purpose, we propose an interface using simultaneous inputs of speech and hand-written gestures. This interface is more robust against environmental noise than speech-only interface, and its input speed is faster than the interface with only hand-written gestures. Our target application is e-mail making with the input of sentences. First year, we proposed an interface in which a sentence is input by speech while the "hiragana" character at the head of each phrase in the sentence is input by hand-written gestures. We implemented a recognition algorithm for hand-written gestures, designed a method for recognizing the simultaneous inputs of the two modes. The proposed method was evaluated by simulation experiments using speech data and hand-written gesture data, which are recorded independently, and was proved to be effective. Second year, we constructed a recording system for the input of the two modes, and recorded 530 sentences from ten subjects. For integrating the two modes, we employed a two-pass process in which a word graph generated by speech recognition in the first pass is utilized for the integration process of the two modes in the second pass. The proposed method improved the recognition accuracy by 2.6 point over the method only with speech recognition. For future work, a method for optimizing the weights among the two modes should be developed. We are going to develop a demonstration system which works in real time and evaluate it in noisy environment.
|
Research Products
(4 results)