Study on Integrated Processing of Speech and Gesture in Multimodal Communication

Research Project

Project/Area Number	10480083
Research Category	Grant-in-Aid for Scientific Research (B).
Allocation Type	Single-year Grants
Section	一般
Research Field	情報システム学(含情報図書館学)
Research Institution	Waseda University
Principal Investigator	SHIRAI Katsuhiko Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (10063702)
Co-Investigator(Kenkyū-buntansha)	YAMASAKI Yoshio Waseda University, Graduate School of Global Information and Telecommunication Studies, Professor, 国際情報通信研究センター, 教授 (10257199) HASHIMOTO Shuji Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (60063806) KOBAYASHI Tetsunori Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (30162001) OKAWA Shigeki Chiba Institute of Technology, Department of Information and Network Science, Associate Professor, 情報ネットワーク学科, 助教授 (40306395)
Project Period (FY)	1998 – 2000
Project Status	Completed (Fiscal Year 2000)
Budget Amount *help	¥9,200,000 (Direct Cost: ¥9,200,000) Fiscal Year 2000: ¥1,500,000 (Direct Cost: ¥1,500,000) Fiscal Year 1999: ¥3,600,000 (Direct Cost: ¥3,600,000) Fiscal Year 1998: ¥4,100,000 (Direct Cost: ¥4,100,000)
Keywords	Multimodal Communication / Gesture Recognition / Speech Recognition / Partly-Hidden Markov Model / Multi-Person Conversation / Dialogue Control / Misunderstanding Detection / Domain Independent Platform / 複数話者対話 / 統計的発話交代モデル / 部分空間法 / 顔面像抽出 / 複合周波数帯域型音声認識 / 姿勢推定 / 音声対話システム汎用プラットフォーム / 音声対話システム / 対話コーパス / マルチモーダル / 隠れマルコフモデル / 顔方向認識 / 対話コーバス
Research Abstract	The purpose of this research is to develop the multimodal communication system which can recognize multimodal Information such as speech and gesture on natural dialog, understand the intention of human by the integration of them, and respond to human appropriately. First of all, it is necessary to clarify the structure of understanding of human intention by the integration of multimodal information and response by multiple modalities. Therefore we have analyzed the acoustic features of speech such as fillers and the roles of gestures such as head movement on the various natural human dialogues. Then we have made studies of speech and gesture recognition algorithm that is fundamental technique for multimodal communication system. We suggest a recombination strategy for multi-band automatic speech recognition which gives more accurate recognition, especially in noisy acoustic environments. And we propose a speech decoder in which the language models are modified to deal with timing of the turn taking and the speaker models are also utilized. We apply a new pattern matching method, Partly-Hidden Markov model, in which the first state is hidden and the second one is observable, to gesture recognition. And we propose the face extraction and the pose detection method to recognize the head movement. Finally, we have implemented multimodal communication model to the human-machine dialogue system. This system uses a method of generalization considering trade-off between variety of dialogue and easiness to describes rules and provides a domain independent platform. Also, it has a spoken dialogue control model for improvement of dialogue efficiency and a dialogue management model for detection of misunderstanding in spoken dialogue system.

Report

(4 results)

2000 Annual Research Report Final Research Report Summary
1999 Annual Research Report
1998 Annual Research Report

Research Products
(44 results)

All Other

All Publications (44 results)

[Publications] 横山真男,白井克彦: "人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御"情報処理学会論文誌. Vol.40,No.2. 487-496 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] 村井則之,小林哲則: "話者性と発話交代を考慮した複数話者対話音声の認識"電子情報通信学会論文誌D-II. J83,No.11. 2465-2472 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] 益満健,小林哲則: "部分隠れマルコフモデルとそのジェスチャの認識への応用"情報処理学会論文誌. Vol.41,No.11. 3060-3069 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] H.Kikuchi,K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference on Intelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] H.Kikuchi,K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWTS). 322-325 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] M.Yokoyama,K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] H.Kikuchi,K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop,Session5.2. 85-88 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] S.Okawa,K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Y.Matsusaka,T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] N.Murai,T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] K.Aoyama,K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybernetics (SMC2000). 1354-1359 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] K.Aoyama,K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP). (CD-ROM). (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] M.Murakami,K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] M.Yokoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-changing For Spoken Dialogue Interface of Humanoid Robot"Transactions of IPSJ. Vol.40, No.2. 487-496 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] N.Murai, T.Kobayashi: "Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking"Transactions of IEICE. D-II, Vol.J83-D-II, No.11. 2465-2472 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] K.Masumitsu, T.Kobayashi: "Partly-Hidden Markov Model and Its Application To Gesture Recognition"Transactions of IPSJ. Vol.41, No.11. 3060-3069 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] H.Kikuchi, K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference onIntelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] H.Kikuchi, K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWIS). 322-325 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] M.Yokoyama, K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] H.Kikuchi, K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop. Session5.2. 85-88 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] S.Okawa, K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Y.Matsusaka, T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] N.Murai, T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER MODEL"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] K.Aoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybemetics (SMC2000). 1354-1359 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] K.Aoyama, K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP), CD-ROM. (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] M.Murakami, K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2000 Final Research Report Summary
[Publications] Kazumi Aoyama: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"IEEE Proc.of SMC2000. 1354-1359 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Kazumi Aoyama: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of ICSLP 2000. CD-ROM (2000)
- Related Report
  2000 Annual Research Report
[Publications] 村井則之: "話者性と発話交代を考慮した複数話者対話音声の認識"電子情報通信学会論文誌D-II. vol.J83,No.11. 2465-2472 (2000)
- Related Report
  2000 Annual Research Report
[Publications] 益満健: "部分隠れマルコフモデルとそのジェスチャの認識への応用"情報処理学会論文誌. vol.41,No.11. 3060-3069 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Makoto Murakami: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Noriyuki Murai: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER MODEL"Proc.of ICASSP 2000. Vol.3. 1575-1578 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Hideaki Kikuchi 他: "Controlling Dialogue Strategy According to Performance of Processes"Proc of ESCA Workshop. 85-88 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Shigeki Okawa 他: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"Proc. of EUROSPEECH'99. Vol.2. 603-606 (1999)
- Related Report
  1999 Annual Research Report
[Publications] 中島雄大他: "マルチバンド型音声認識のための部分帯域特徴量の情報量評価"電子情報通信学会技術報告. SP99-97. 25-30 (1999)
- Related Report
  1999 Annual Research Report
[Publications] 青山一美他: "音声対話システム汎用ブラットフォ-ムの検討"情報処理学会研究報告. SLP-30. 7-12 (2000)
- Related Report
  1999 Annual Research Report
[Publications] Yosuke Matsusaka 他: "Multi-person Conversation via Multi-modal Interface"Proc. of EUROSPEECH '99. Vol.4. 1723-1726 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Shigeki Ohira: "Proposal and Evaluation of Significant Word Selection Method."Proc. of the First NTCIR Workshop on R-JTRTR. 109-116 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Hideaki Kikuchi Katsuhiko Shirai: "Controlling Gaze of Humanoid in Communication with Human" Proc.Of International conference on Intelligent Robots and Systems. Vol.1. 255-260 (1998)
- Related Report
  1998 Annual Research Report
[Publications] 横山真男:白井克彦: "人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御" 情報処理学会論文誌. 2月号. (1999)
- Related Report
  1998 Annual Research Report
[Publications] Masao Yokoyama: Katsuhiko Shirai: "Use of Non-Verbal Information in Communication between Human and Robot" Proc.Of International conference on Spoken Language Procesing. 2351-2354 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Hideaki Kikuchi : Katsuhiko Shirai: "Multimodal Communication Between Human and Robot" Proc.of International Wireless and Telecommunications Symposium. 322-325 (1998)
- Related Report
  1998 Annual Research Report
[Publications] 益満健:白井克彦: "部分隠れマルコフモデルとそのジェスチャー認識への応用" 電子情報通信学会技術研究報告. PRMU97-203. 35-62 (1998)
- Related Report
  1998 Annual Research Report
[Publications] 田窪行則:白井克彦: "岩波書店" 岩波講座言語の科学 2音声, 249 (1998)
- Related Report
  1998 Annual Research Report

Study on Integrated Processing of Speech and Gesture in Multimodal Communication

Principal Investigator

SHIRAI Katsuhiko Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (10063702)

¥9,200,000 (Direct Cost: ¥9,200,000)

Report

Research Products

[Publications] 横山真男,白井克彦: "人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御"情報処理学会論文誌. Vol.40,No.2. 487-496 (1999)

Description

Related Report

[Publications] 村井則之,小林哲則: "話者性と発話交代を考慮した複数話者対話音声の認識"電子情報通信学会論文誌D-II. J83,No.11. 2465-2472 (2000)

Description

Related Report

[Publications] 益満健,小林哲則: "部分隠れマルコフモデルとそのジェスチャの認識への応用"情報処理学会論文誌. Vol.41,No.11. 3060-3069 (2000)

Description

Related Report

[Publications] H.Kikuchi,K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference on Intelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)

Description

Related Report

[Publications] H.Kikuchi,K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWTS). 322-325 (1998)

Description

Related Report

[Publications] M.Yokoyama,K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)

Description

Related Report

[Publications] H.Kikuchi,K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop,Session5.2. 85-88 (1999)

Description

Related Report

[Publications] S.Okawa,K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1999)

Description

Related Report

[Publications] Y.Matsusaka,T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)

Description

Related Report

[Publications] N.Murai,T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)

Description

Related Report

[Publications] K.Aoyama,K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybernetics (SMC2000). 1354-1359 (2000)

Description

Related Report

[Publications] K.Aoyama,K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP). (CD-ROM). (2000)

Description

Related Report

[Publications] M.Murakami,K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)

Description

Related Report

[Publications] M.Yokoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-changing For Spoken Dialogue Interface of Humanoid Robot"Transactions of IPSJ. Vol.40, No.2. 487-496 (1999)

Description

Related Report

[Publications] N.Murai, T.Kobayashi: "Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking"Transactions of IEICE. D-II, Vol.J83-D-II, No.11. 2465-2472 (2000)

Description

Related Report

[Publications] K.Masumitsu, T.Kobayashi: "Partly-Hidden Markov Model and Its Application To Gesture Recognition"Transactions of IPSJ. Vol.41, No.11. 3060-3069 (2000)

Description

Related Report

[Publications] H.Kikuchi, K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference onIntelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)

Description

Related Report

[Publications] H.Kikuchi, K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWIS). 322-325 (1998)

Description

Related Report

[Publications] M.Yokoyama, K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)

Description

Related Report

[Publications] H.Kikuchi, K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop. Session5.2. 85-88 (1999)

Description

Related Report

[Publications] S.Okawa, K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1998)

Description

Related Report

[Publications] Y.Matsusaka, T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)

Description

Related Report

[Publications] N.Murai, T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER MODEL"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)

Description

Related Report

[Publications] K.Aoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybemetics (SMC2000). 1354-1359 (2000)

Description

Related Report

[Publications] K.Aoyama, K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP), CD-ROM. (2000)

Description

[Publications] 中島雄大他: "マルチバンド型音声認識のための部分帯域特徴量の情報量評価"電子情報通信学会技術報告. SP99-97. 25-30 (1999)

[Publications] 青山一美他: "音声対話システム汎用ブラットフォ-ムの検討"情報処理学会研究報告. SLP-30. 7-12 (2000)

[Publications] 横山真男:白井克彦: "人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御" 情報処理学会論文誌. 2月号. (1999)

[Publications] 益満健:白井克彦: "部分隠れマルコフモデルとそのジェスチャー認識への応用" 電子情報通信学会技術研究報告. PRMU97-203. 35-62 (1998)

[Publications] 田窪行則:白井克彦: "岩波書店" 岩波講座言語の科学 2音声, 249 (1998)