Grant-in-Aid for Scientific Research (A)
|Allocation Type||Single-year Grants|
Perception information processing/Intelligent robotics
|Research Institution||Kyoto University|
OKUNO Hiroshi Kyoto University, G.Graduate School of Informatics, Professor, 情報学研究科, 教授 (60318201)
KAWAHARA Tatsuya Kyoto University, Academic Center for Computing and Media Studies, Professor, 学術情報メディアセンター, 教授 (00234104)
SATO Satoshi Nagoya University, Graduate School of Engineering, Professor, 工学研究科, 教授 (30205918)
KOMATANI Kazunori Kyoto University, Graduate School of Informatics, Assistant Professor, 情報学研究科, 助手 (40362579)
WADA Toshikazu Wakayama University, Faculty of System Engineering, Professor, システム工学部, 教授 (00231035)
GOTO Masataka Advanced Institute for Science and Technology, Information Processing Division, Senior Researcher, 情報処理研究部門, 主任研究員 (20357007)
宮原 誠 北陸先端科学技術大学院大学, 情報科学研究科, 教授 (00115122)
中臺 一博 (株)ホンダ・リサーチ・インスティチュート・ジャパン, シニア・リサーチャー
|Project Period (FY)
2003 – 2006
Completed(Fiscal Year 2006)
|Budget Amount *help
¥51,350,000 (Direct Cost : ¥39,500,000、Indirect Cost : ¥11,850,000)
Fiscal Year 2006 : ¥4,420,000 (Direct Cost : ¥3,400,000、Indirect Cost : ¥1,020,000)
Fiscal Year 2005 : ¥8,190,000 (Direct Cost : ¥6,300,000、Indirect Cost : ¥1,890,000)
Fiscal Year 2004 : ¥15,990,000 (Direct Cost : ¥12,300,000、Indirect Cost : ¥3,690,000)
Fiscal Year 2003 : ¥22,750,000 (Direct Cost : ¥17,500,000、Indirect Cost : ¥5,250,000)
|Keywords||Robot Audition / Computational Auditory Scene Analysis / Audio-Visual Integration / Music Information Processing / Automatic Onomatopoeia Recognition / Missing Feature Theory / Automatic Missing Feature Mask Generation / Genetic Algorithm / ミッシングフィーチャ / 色弁別度 / ミッシングフィーチャー理論 / 自動マスク生成 / 黄忠実音再生システム / 対人距離によるインタラクション / 色ターゲット検出 / 環境音の擬音語自動認識 / 楽器音自動認識・歌手認識 / エピゾセンサー / パラメトリックスピーカ / 最近傍識別器 / 柔軟な音声対話システム|
Robot audition is a capability in which a humanoid can hear sounds with its own microphones (ears) mounted on its body. Since humanoids usually hear a mixture of sounds in the real world, Computational Auditory Scene Analysis (CASA) of which essential functions consist of sound source localization, separation, and recognition of separated sounds is required to realize the capability of listening to several things simultaneously, like "Shotoku-Taishi" (Prince Shotoku). We have obtained the following research results :
1) CASA functions with less prior information :
The missing-feature based approach integrated sound localization (MUSIC, or steered beamformer), sound source separation (Geometrical Source Separation, or Independent Component Analysis), and automatic speech recognition (Mulit-band Julius, or CTK) by developing automatic missing feature mask generation. The whole system was implemented on the FlowDesigner architecture, so that recognizing three simultaneous speech was perform
ed with latency of 1.9 sec. This result confirmed the validity of our approach on different humanoids including SIG2, Robovie-R2, and ASIMO.
2) Distance-based behavior selection :
The interaction strategy based on the distance between the humanoid and people according to Proxemics was devised to select an appropriate interaction partner. This system implemented on SIG-2 Humanoid was demonstrated for three months at the Kyoto University Museum to confirm its effectiveness in multiple person interaction.
3) Robust face tracking was developed based on Color-target Detection Based on Nearest Neighbor Classifier to improve the performance of moving talker tracking.
4) Music information technologies for polyphonic music, including musical instrument recognition, drum sound extraction, and singer recognition, were developed for humanoids to hear music.
5) User model and error recovery from speech recognition errors were developed to improve the usability of multi-domain spoken dialogue system.
6) Automatic onomatopoeia recognition system was developed to use environmental sounds in humanoid-human interaction.
Future work includes the design and development of robot audition based on CASA. Less