2004 Fiscal Year Final Research Report Summary
Construction of Multimodal Emotion Representation Model for Computer Animation
Project/Area Number |
14208031
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Waseda University |
Principal Investigator |
SHIRAI Katsuhiko Waseda University, Faculty of Social Sciences, Professor, 理工学術院, 教授 (10063702)
|
Co-Investigator(Kenkyū-buntansha) |
KOBAYASHI Tetsunori Waseda University, Faculty of Social Sciences, Professor, 理工学術院, 教授 (30162001)
YONEYAMA Masahide Toyo University, Faculty of Engineering, Professor, 工学部, 教授 (60277358)
YAMAZAKI Yoshio Waseda University, Graduate School of Global Information and Telecommunication Studies, Professor, 国際情報通信研究科, 教授 (10257199)
OHIRA Shigeki Nagoya University, Eco Topia Science Institute, Research Associates, 情報メディア教育センター, 助手 (60339695)
MURAKAMI Makoto Toyo University, Faculty of Engineering, Lecturer, 工学部, 講師 (80329119)
|
Project Period (FY) |
2002 – 2004
|
Keywords | computer animation / multimodal emotion representation model / computer graphics / character animation / prosody control / motion control / motion generation |
Research Abstract |
To clarify how emotion appears in speech sound, we analyzed it using rakugo comic stories speech data, which is a kind of the most natural and emotional speech data. As a result the variance in speech sound with emotion mainly appears at the end of utterance. Then we focused on laughing voice as an emotion representation in physiological function. As a result of its analysis f0 frequency and phoneme timing are the fundamental features to perceive the voice as laughing. Then to generate motion with emotion from language instructions we constructed emotion representation model, in which the relation between the emotional words and the motion is described as a binary tree. Then we implemented the virtual actor system, which consists of the emotion representation component to generate target motion from language instructions using the emotion representation model, and the emotion learning component to update the emotion representation model when an unknown word is given. As a result of the evaluation experiment our system generates the appropriate motion with emotion. Finally, in order to clarify the relation between the signals of video and sound and the emotion which we perceive from them, we analyzed it using the visual and speech data with emotion. As a result speakers represent emotional level by the change of not facial expression but voice. At the same time listeners recognize the kind of emotion from the speakers' facial expression, and perceive the level of the emotion from the speakers' voice.
|