2000 Fiscal Year Final Research Report Summary

Study on Integrated Processing of Speech and Gesture in Multimodal Communication

Research Project

Project/Area Number	10480083
Research Category	Grant-in-Aid for Scientific Research (B).
Allocation Type	Single-year Grants
Section	一般
Research Field	情報システム学(含情報図書館学)
Research Institution	Waseda University
Principal Investigator	SHIRAI Katsuhiko Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (10063702)
Co-Investigator(Kenkyū-buntansha)	YAMASAKI Yoshio Waseda University, Graduate School of Global Information and Telecommunication Studies, Professor, 国際情報通信研究センター, 教授 (10257199) HASHIMOTO Shuji Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (60063806) KOBAYASHI Tetsunori Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (30162001) OKAWA Shigeki Chiba Institute of Technology, Department of Information and Network Science, Associate Professor, 情報ネットワーク学科, 助教授 (40306395)
Project Period (FY)	1998 – 2000
Keywords	Multimodal Communication / Gesture Recognition / Speech Recognition / Partly-Hidden Markov Model / Multi-Person Conversation / Dialogue Control / Misunderstanding Detection / Domain Independent Platform
Research Abstract	The purpose of this research is to develop the multimodal communication system which can recognize multimodal Information such as speech and gesture on natural dialog, understand the intention of human by the integration of them, and respond to human appropriately. First of all, it is necessary to clarify the structure of understanding of human intention by the integration of multimodal information and response by multiple modalities. Therefore we have analyzed the acoustic features of speech such as fillers and the roles of gestures such as head movement on the various natural human dialogues. Then we have made studies of speech and gesture recognition algorithm that is fundamental technique for multimodal communication system. We suggest a recombination strategy for multi-band automatic speech recognition which gives more accurate recognition, especially in noisy acoustic environments. And we propose a speech decoder in which the language models are modified to deal with timing of the turn taking and the speaker models are also utilized. We apply a new pattern matching method, Partly-Hidden Markov model, in which the first state is hidden and the second one is observable, to gesture recognition. And we propose the face extraction and the pose detection method to recognize the head movement. Finally, we have implemented multimodal communication model to the human-machine dialogue system. This system uses a method of generalization considering trade-off between variety of dialogue and easiness to describes rules and provides a domain independent platform. Also, it has a spoken dialogue control model for improvement of dialogue efficiency and a dialogue management model for detection of misunderstanding in spoken dialogue system.

Research Products
(26 results)

All Other

All Publications (26 results)

[Publications] 横山真男,白井克彦: "人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御"情報処理学会論文誌. Vol.40,No.2. 487-496 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 村井則之,小林哲則: "話者性と発話交代を考慮した複数話者対話音声の認識"電子情報通信学会論文誌D-II. J83,No.11. 2465-2472 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 益満健,小林哲則: "部分隠れマルコフモデルとそのジェスチャの認識への応用"情報処理学会論文誌. Vol.41,No.11. 3060-3069 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] H.Kikuchi,K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference on Intelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] H.Kikuchi,K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWTS). 322-325 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] M.Yokoyama,K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)
- Description
  「研究成果報告書概要(和文)」より
[Publications] H.Kikuchi,K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop,Session5.2. 85-88 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] S.Okawa,K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Y.Matsusaka,T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)
- Description
  「研究成果報告書概要(和文)」より
[Publications] N.Murai,T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] K.Aoyama,K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybernetics (SMC2000). 1354-1359 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] K.Aoyama,K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP). (CD-ROM). (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] M.Murakami,K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] M.Yokoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-changing For Spoken Dialogue Interface of Humanoid Robot"Transactions of IPSJ. Vol.40, No.2. 487-496 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] N.Murai, T.Kobayashi: "Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking"Transactions of IEICE. D-II, Vol.J83-D-II, No.11. 2465-2472 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] K.Masumitsu, T.Kobayashi: "Partly-Hidden Markov Model and Its Application To Gesture Recognition"Transactions of IPSJ. Vol.41, No.11. 3060-3069 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] H.Kikuchi, K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference onIntelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] H.Kikuchi, K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWIS). 322-325 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] M.Yokoyama, K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] H.Kikuchi, K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop. Session5.2. 85-88 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] S.Okawa, K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1998)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Y.Matsusaka, T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] N.Murai, T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER MODEL"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] K.Aoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybemetics (SMC2000). 1354-1359 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] K.Aoyama, K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP), CD-ROM. (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] M.Murakami, K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)
- Description
  「研究成果報告書概要(欧文)」より

2000 Fiscal Year Final Research Report Summary

Study on Integrated Processing of Speech and Gesture in Multimodal Communication

Principal Investigator

SHIRAI Katsuhiko Waseda University, School of Science and Engineering, Professor, 理工学部, 教授 (10063702)

Research Products

[Publications] 横山真男,白井克彦: "人間型ロボットの対話インタフェースにおける発話交替時の非言語情報の制御"情報処理学会論文誌. Vol.40,No.2. 487-496 (1999)

Description

[Publications] 村井則之,小林哲則: "話者性と発話交代を考慮した複数話者対話音声の認識"電子情報通信学会論文誌D-II. J83,No.11. 2465-2472 (2000)

Description

[Publications] 益満健,小林哲則: "部分隠れマルコフモデルとそのジェスチャの認識への応用"情報処理学会論文誌. Vol.41,No.11. 3060-3069 (2000)

Description

[Publications] H.Kikuchi,K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference on Intelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)

Description

[Publications] H.Kikuchi,K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWTS). 322-325 (1998)

Description

[Publications] M.Yokoyama,K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)

Description

[Publications] H.Kikuchi,K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop,Session5.2. 85-88 (1999)

Description

[Publications] S.Okawa,K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1999)

Description

[Publications] Y.Matsusaka,T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)

Description

[Publications] N.Murai,T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)

Description

[Publications] K.Aoyama,K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybernetics (SMC2000). 1354-1359 (2000)

Description

[Publications] K.Aoyama,K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP). (CD-ROM). (2000)

Description

[Publications] M.Murakami,K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)

Description

[Publications] M.Yokoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-changing For Spoken Dialogue Interface of Humanoid Robot"Transactions of IPSJ. Vol.40, No.2. 487-496 (1999)

Description

[Publications] N.Murai, T.Kobayashi: "Dictation of Multiparty Conversation Considering Speaker Individuality and Turn Taking"Transactions of IEICE. D-II, Vol.J83-D-II, No.11. 2465-2472 (2000)

Description

[Publications] K.Masumitsu, T.Kobayashi: "Partly-Hidden Markov Model and Its Application To Gesture Recognition"Transactions of IPSJ. Vol.41, No.11. 3060-3069 (2000)

Description

[Publications] H.Kikuchi, K.Shirai: "Controlling Gaze of Humanoid in Communication with Human"Proc.of International Conference onIntelligent Robots and Systems (IROS). Vol.1. 255-260 (1998)

Description

[Publications] H.Kikuchi, K.Shirai: "Multimodal Communication Between Human and Robot"Proc.of International Wireless and Telecommunications Symposium (IWIS). 322-325 (1998)

Description

[Publications] M.Yokoyama, K.Shirai: "Use of Non-Verbal Information in Communication between Human and Robot"Proc.of International Conference on Spoken Language Processing (ICSLP). 2351-2354 (1998)

Description

[Publications] H.Kikuchi, K.Shirai: "Controlling Dialogue Strategy According to Performance of Processes"ESCA Workshop. Session5.2. 85-88 (1999)

Description

[Publications] S.Okawa, K.Shirai: "A Recombination Strategy for Multi-band Speech Recognition Based on Mutual Information Criterion"6th European Conference on Speech Communication and Technology : EUROSPEECH'99. Vol.2. 603-606 (1998)

Description

[Publications] Y.Matsusaka, T.Kobayashi: "Multi-person Conversation Robot using Multi-modal Interface"SCI'99. Vol.7. 450-455 (1999)

Description

[Publications] N.Murai, T.Kobayashi: "DICTATION OF MULTIPARTY CONVERSATION USING STATISTICAL TURN TAKING MODEL AND SPEAKER MODEL"Proc.of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Vol.3. 1575-1578 (2000)

Description

[Publications] K.Aoyama, K.Shirai: "Controlling Non-verbal Information in Speaker-change for Spoken Dialogue"2000 IEEE International Conference on Systems Man and Cybemetics (SMC2000). 1354-1359 (2000)

Description

[Publications] K.Aoyama, K.Shirai: "DESIGNING A DOMAIN INDEPENDENT PLATFORM OF SPOKEN DIALOGUE SYSTEM"Proc.of International Conference on Spoken Language Processing (ICSLP), CD-ROM. (2000)

Description

[Publications] M.Murakami, K.Shirai: "Accurate Extraction of Human Face Area using Subspace Method and Genetic Algorithm"Proc.of International Conference Multimedia and Expo. 411-414 (2000)

Description