2006 Fiscal Year Final Research Report Summary
Study on Computational Auditory Scene Analysis for Humanoids by Active Audition
Project/Area Number |
15200015
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Perception information processing/Intelligent robotics
|
Research Institution | Kyoto University |
Principal Investigator |
OKUNO Hiroshi Kyoto University, G.Graduate School of Informatics, Professor, 情報学研究科, 教授 (60318201)
|
Co-Investigator(Kenkyū-buntansha) |
KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)
SATO Satoshi Nagoya University, Graduate School of Engineering, Professor, 工学研究科, 教授 (30205918)
KOMATANI Kazunori Kyoto University, Graduate School of Informatics, Assistant Professor, 情報学研究科, 助手 (40362579)
WADA Toshikazu Wakayama University, Faculty of System Engineering, Professor, システム工学部, 教授 (00231035)
GOTO Masataka Advanced Institute for Science and Technology, Information Processing Division, Senior Researcher, 情報処理研究部門, 主任研究員 (20357007)
|
Project Period (FY) |
2003 – 2006
|
Keywords | Robot Audition / Computational Auditory Scene Analysis / Audio-Visual Integration / Music Information Processing / Automatic Onomatopoeia Recognition / Missing Feature Theory / Automatic Missing Feature Mask Generation / Genetic Algorithm |
Research Abstract |
Robot audition is a capability in which a humanoid can hear sounds with its own microphones (ears) mounted on its body. Since humanoids usually hear a mixture of sounds in the real world, Computational Auditory Scene Analysis (CASA) of which essential functions consist of sound source localization, separation, and recognition of separated sounds is required to realize the capability of listening to several things simultaneously, like "Shotoku-Taishi" (Prince Shotoku). We have obtained the following research results : 1) CASA functions with less prior information : The missing-feature based approach integrated sound localization (MUSIC, or steered beamformer), sound source separation (Geometrical Source Separation, or Independent Component Analysis), and automatic speech recognition (Mulit-band Julius, or CTK) by developing automatic missing feature mask generation. The whole system was implemented on the FlowDesigner architecture, so that recognizing three simultaneous speech was perform
… More
ed with latency of 1.9 sec. This result confirmed the validity of our approach on different humanoids including SIG2, Robovie-R2, and ASIMO. 2) Distance-based behavior selection : The interaction strategy based on the distance between the humanoid and people according to Proxemics was devised to select an appropriate interaction partner. This system implemented on SIG-2 Humanoid was demonstrated for three months at the Kyoto University Museum to confirm its effectiveness in multiple person interaction. 3) Robust face tracking was developed based on Color-target Detection Based on Nearest Neighbor Classifier to improve the performance of moving talker tracking. 4) Music information technologies for polyphonic music, including musical instrument recognition, drum sound extraction, and singer recognition, were developed for humanoids to hear music. 5) User model and error recovery from speech recognition errors were developed to improve the usability of multi-domain spoken dialogue system. 6) Automatic onomatopoeia recognition system was developed to use environmental sounds in humanoid-human interaction. Future work includes the design and development of robot audition based on CASA. Less
|
Research Products
(80 results)