Budget Amount *help |
¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2003: ¥700,000 (Direct Cost: ¥700,000)
Fiscal Year 2002: ¥1,400,000 (Direct Cost: ¥1,400,000)
Fiscal Year 2001: ¥1,500,000 (Direct Cost: ¥1,500,000)
|
Research Abstract |
Multimodal system has the potential to greatly improve the flexibility, robustness, efficiency, universal accessibility and naturalness of human-machine interaction. This study investigated two multimodal techniques related with the integration of speech and eyesight, because humans naturally use these two modalities to communicate with each other. The first study was about the gaze and mouse multimodal user interface. The eyesight naturally indicates one's attentions and interests, and the eye movement is rapid, so the eye gaze information can provide a quick, natural and convenient input method. In order to improve the accuracy of the gaze input method, a gaze and mouse multimodal complementary method was proposed. In this method, gaze modality was used to improve speed by selecting directly or shortening a moving distance of mouse, and the mouse was used to improve accuracy when the gaze fixation was far away from a target. The second study was about gaze and speech multimodal input m
… More
ethodologies. We use these two modalities naturally and simultaneously in our daily life especially when determining deictic referents in a spoken dialogue. However, the recognition ambiguities of speech and gaze inputs are inevitable. Since both gaze and speech were error-prone modalities as a stand-alone, the goal of this study was to build an effective and robust human computer interaction system through these modalities. The features of the speech and gaze multimodal system are as follows: ・The multimodal architecture can support the mutual correction of recognition errors from component modalities. Speech recognition errors can be corrected by gaze, and vice versa. Even if both gaze and speech recognition errors occur, the correct multimodal result can be obtained. ・Ambiguities in the speech signal can be resolved by gaze information. The multimodal architecture eliminates the need for the lengthy definite descriptions that would be necessary for unnamed objects if only speech is used. Thus, gaze information significantly contributes to simplifying the user's speaking. Simplified speech causes less ・recognition errors, and facilitates both error avoidance and user's acceptance, as well as provides a natural and intuitive way to interact with the computer. ・The simplified speech contributes to improving interaction speed, and provides users with an efficient multimodal interface. Less
|