Project/Area Number |
10480083
|
Research Category |
Grant-in-Aid for Scientific Research (B).
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報システム学(含情報図書館学)
|
Research Institution | Waseda University |
Principal Investigator |
SHIRAI Katsuhiko Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (10063702)
|
Co-Investigator(Kenkyū-buntansha) |
YAMASAKI Yoshio Waseda University, Graduate School of Global Information and Telecommunication Studies, Professor, 国際情報通信研究センター, 教授 (10257199)
HASHIMOTO Shuji Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (60063806)
KOBAYASHI Tetsunori Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (30162001)
OKAWA Shigeki Chiba Institute of Technology, Department of Information and Network Science, Associate Professor, 情報ネットワーク学科, 助教授 (40306395)
|
Project Period (FY) |
1998 – 2000
|
Project Status |
Completed (Fiscal Year 2000)
|
Budget Amount *help |
¥9,200,000 (Direct Cost: ¥9,200,000)
Fiscal Year 2000: ¥1,500,000 (Direct Cost: ¥1,500,000)
Fiscal Year 1999: ¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 1998: ¥4,100,000 (Direct Cost: ¥4,100,000)
|
Keywords | Multimodal Communication / Gesture Recognition / Speech Recognition / Partly-Hidden Markov Model / Multi-Person Conversation / Dialogue Control / Misunderstanding Detection / Domain Independent Platform / 複数話者対話 / 統計的発話交代モデル / 部分空間法 / 顔面像抽出 / 複合周波数帯域型音声認識 / 姿勢推定 / 音声対話システム汎用プラットフォーム / 音声対話システム / 対話コーパス / マルチモーダル / 隠れマルコフモデル / 顔方向認識 / 対話コーバス |
Research Abstract |
The purpose of this research is to develop the multimodal communication system which can recognize multimodal Information such as speech and gesture on natural dialog, understand the intention of human by the integration of them, and respond to human appropriately. First of all, it is necessary to clarify the structure of understanding of human intention by the integration of multimodal information and response by multiple modalities. Therefore we have analyzed the acoustic features of speech such as fillers and the roles of gestures such as head movement on the various natural human dialogues. Then we have made studies of speech and gesture recognition algorithm that is fundamental technique for multimodal communication system. We suggest a recombination strategy for multi-band automatic speech recognition which gives more accurate recognition, especially in noisy acoustic environments. And we propose a speech decoder in which the language models are modified to deal with timing of the turn taking and the speaker models are also utilized. We apply a new pattern matching method, Partly-Hidden Markov model, in which the first state is hidden and the second one is observable, to gesture recognition. And we propose the face extraction and the pose detection method to recognize the head movement. Finally, we have implemented multimodal communication model to the human-machine dialogue system. This system uses a method of generalization considering trade-off between variety of dialogue and easiness to describes rules and provides a domain independent platform. Also, it has a spoken dialogue control model for improvement of dialogue efficiency and a dialogue management model for detection of misunderstanding in spoken dialogue system.
|