聴覚・視覚の複数レベル実時間情報統合の研究

Research Project

Project/Area Number	15017251
Research Category	Grant-in-Aid for Scientific Research on Priority Areas
Allocation Type	Single-year Grants
Review Section	Science and Engineering
Research Institution	Kyoto University
Principal Investigator	奥乃博京都大学, 情報学研究科, 教授 (60318201)
Co-Investigator(Kenkyū-buntansha)	中臺一博 (株)ホンダ・リサーチ・インスティチュート・ジャパン, シニア・リサーチャー駒谷和範京都大学, 情報学研究科, 助手 (40362579)
Project Period (FY)	2003
Project Status	Completed (Fiscal Year 2003)
Budget Amount *help	¥5,100,000 (Direct Cost: ¥5,100,000) Fiscal Year 2003: ¥5,100,000 (Direct Cost: ¥5,100,000)
Keywords	アクティブオーディション / 音と画像の実時間情報統合 / ヒューマノイドロボット / 散乱理論 / 頭部音響伝達関数の近似 / 両耳間位相差・両耳間強度差 / ミッシングフィーチャ理論 / 柔軟な音声対話
Research Abstract	ヒューマノイドと人間との柔軟なコミュニケーションのために,混合音を聞き分け理解する機能を設計することを目的としている.平成15年度は,前年度開発をした方向情報や話者情報などの複数のレベルで視覚と聴覚を統合したアクティブ方向通過型フィルタ(ADPF)の高性能化,及び,ADPFを使用した音源分離システムと音声認識システムのインタフェース化を行い,簡単な3話者同時発話認識を,複数のロボット上に実現した.また,日本ロボット学会に「ロボット聴覚」研究専門委員会を設立した. (1)アクティブ方向通過型フィルタ(ADPF)の散乱理論による高性能化:画像と音から得られる話者の方向情報を基に,特定の方向からの音を分離するADPFでは,2本のマイクロフォンで得られる入力音から求めた両耳間位相差と両耳間強度差を用いて方向情報を得ていた.聴覚エピポーラ幾何に加えて散乱理論により頭部音響伝達関数の近似精度を向上させた結果,30度以上の周辺領域で音源定位と音源分離性能を大幅に向上させることができた.さらに,2種類のヒューマノイドロボット,SIG2とReplieに実装し,本手法の一般性を確認した. (2)3話者同時発話認識(聖徳太子ロボットの予備実験):昨年5月に放映された「鉄腕アトムを作る」(NHK)では方向と話者に依存した音響モデルを使用し3話者同時発話認識を行っていた.ADFPで得られる分離音は,周波数成分での特徴量が欠け,時間成分でのデータも喪失しているので,単一の音響モデルで済ませるために,ミッシングフィーチャ理論に基づいた音声認識システムを開発し,演繹ミッシングマスクにより,分離音の認識精度が大幅に向上することを確認した. (3)音一般の認識と対話システムへの展開:音声を用いた柔軟な対話システム構築のために,音声認識誤りに確信度を導入し,不要な問い合わせを解消する方法を開発した.また,非音声認識のために,楽器音認識と擬音語認識にも取り組み,単音について認識技法を確立した.

Report

(1 results)

2003 Annual Research Report

Research Products
(25 results)

All Other

All Publications (25 results)

[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai, Tino Lourens, Hiroaki Kitano: "Sound and Visual Tracking for Humanoid Robot"Applied Intelligence. 20・3. 253-266 (2004)
- Related Report
  2003 Annual Research Report
[Publications] 北原鉄朗, 後藤真孝, 奥乃博: "音響的類似性を反映した楽器の階層表現の獲得とそれに基づく未知楽器のカテゴリーレベルの音源同定"情報処理学会論文誌. 45・3. 680-689 (2004)
- Related Report
  2003 Annual Research Report
[Publications] 山肩洋子, 河原達也, 奥乃博, 美濃導彦: "音声対話システムにおける物体指示のための信念ネットワークを用いた曖昧性の解消"人工知能学会誌. 19・1F. 47-56 (2004)
- Related Report
  2003 Annual Research Report
[Publications] 北原鉄朗, 後藤真孝, 奥乃博: "音高による音色変化に着目した楽器音の音源同定:F0依存多次元正規分布に基づく識別手法"情報処理学会論文誌. 44・10. 2448-2458 (2004)
- Related Report
  2003 Annual Research Report
[Publications] 中臺一博, 日台健一, 奥乃博, 溝口博, 北野宏明: "ヒューマノイドを対象にした視聴覚統合による実時間人物追跡:アクティブオーディションと顔認識の統合"ロボット学会誌. 21・5. 517-525 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 駒谷和範, 鹿島博晶, 田中克明, 河原達也: "複合的言語制約に基づくキーフレーズ検出を用いた汎用的なデータベース検索音声対話プラットフォーム"情報処理学会論文誌. 44・5. 1333-1342 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai: "Active audition for humanoid robots that can listen to three simultaneous talkers"Journal of the Acoustical Society of America. 113・4,Pt2. 2230-2230 (2003)
- Related Report
  2003 Annual Research Report
[Publications] 奥乃博, 中臺一博: "ロボット聴覚の課題と現状"情報処理. 44・11. 1138-1144 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazuhiro Nakadai, D.Matsuura, Hiroshi G.Okuno, H.Kitano: "Applying Scattering Theory to Robot Audition System"Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2003). 1147-1152 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai, Hiroaki Kitano: "Realizing Personality in Audio-Visually Triggered Non-verbal Behaviors"Proceedings of IEEE/RAS International Conference on Robots and Automation (ICRA-2003). 392-397 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazuhiro Nakadai, Hiroshi G.Okuno, Hiroaki Kitano: "Robot Recognizes Three Simultaneous Speech By Active Audition"Proceedings of IEEE/RAS International Conference on Robots and Automation (ICRA-2003). 398-403 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazuhiro Nakadai, D.Matsuura, Hiroshi G.Okuno, H.Kitano: "Improvement of Three Simultaneous Speech Recognition by Using AV Integration and Scattering Theory for Humanoid"Proceedings of Audio Visual Spoken Processing (AVSP-2003). 157-162 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazunori Komatani, S.Ueno, T.Kawahara, Hiroshi G.Okuno: "User Modeling in Spoken Dialogue Systems for Flexible Guidance Generation"Proceedings of the Eighth European Conference on Speech Communication and Technology (Eurospeech-2003). 745-748 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazushi Ishihara, Yasushi Tsubota, Hiroshi G.Okuno: "Automatic Transformation of Environmental Sounds into Sound-Imitation Words Based on Japanese Syllable Structure"Proceedings of the Eighth European Conference on Speech Communication and Technology (Eurospeech-2003). 3185-3188 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazuhiro Nakadai, D.Matsuura, Hiroshi G.Okuno, H.Tsujino: "Three Simultaneous Speech Recognition by Integration of Active Audition and Face Recognition for Humanoid"Proceedings of the Eighth European Conference on Speech Communication and Technology (Eurospeech-2003). 2705-2708 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Tatsuya Kawabahara, Ryosuke Ito, Kazunori Komatani: "Spoken Dialogue System for Queries on Appliance Manuals using Hierarchical Confirmation Strategy"Proceedings of the Eighth European Conference on Speech Communication and Technology (Eurospeech-2003). 1701-1704 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Kazunori Komatani, S.Ueno, T.Kawahara, Hiroshi G.Okuno: "Flexible Guidance Generation using User Model in Spoken Dialogue Systems"Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). 256-263 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Tetsuro Kitahara, Masataka Goto, Hiroshi G.Okuno: "Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution"Proceedings of 2003 International Conference on Multimedia and Expo (ICME 2003). 405-409 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Tetsuro Kitahara, Masataka Goto, Hiroshi G.Okuno: "Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology"Developments in Applied Artificial Intelligence. LNAI 2718. 112-122 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai, Hiroaki Kitano: "Design and Implementation of Personality of Humanoids in Human Humanoid Non-verbal Interaction"Developments in Applied Artificial Intelligence. LNAI 2718. 405-409 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai: "Real-time Sound Source Localization and Separation based on Active Audio-Visual Integration"Computational Methods in Neural Modeling. LNCS2686. 110-125 (2003)
- Related Report
  2003 Annual Research Report
[Publications] Tetsuro Kitahara, Masataka Goto, Hiroshi G.Okuno: "Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution"Proceedings of 2003 International Conference on Acoustics, Speech and Signal Processing (ICASSP'2003). 421-424 (2003)
- Related Report
  2003 Annual Research Report
[Publications] S.Yamamoto, Kazuhiro Nakadai, H.Tsujino, Hiroshi G.Okuno: "Improvement of Robot Audition by Interfacing Sound Source Separation and Automatic Speech Recognition with Missing Feature Theory"Proceedings of IEEE/RAS International Conference on Robots and Automation (ICRA-2004). (印刷中). (2004)
- Related Report
  2003 Annual Research Report
[Publications] Kazunori Komatani, R.Itoh, T.Kawahara, Hiroshi G.Okuno: "Recognition of Emotional States in Spoken Dialogue with a Robot"Proc.of 17^<th> International Conf.on Industrial and Engineering Applications of AI & Expert Systems (IEA/AIE-2004). (印刷中). (2004)
- Related Report
  2003 Annual Research Report
[Publications] 奥乃博: "AI事典、第2版"共立出版. 544 (2003)
- Related Report
  2003 Annual Research Report

聴覚・視覚の複数レベル実時間情報統合の研究

Principal Investigator

奥乃 博 京都大学, 情報学研究科, 教授 (60318201)

¥5,100,000 (Direct Cost: ¥5,100,000)

Report

Research Products

[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai, Tino Lourens, Hiroaki Kitano: "Sound and Visual Tracking for Humanoid Robot"Applied Intelligence. 20・3. 253-266 (2004)

Related Report

[Publications] 北原鉄朗, 後藤真孝, 奥乃博: "音響的類似性を反映した楽器の階層表現の獲得とそれに基づく未知楽器のカテゴリーレベルの音源同定"情報処理学会論文誌. 45・3. 680-689 (2004)

Related Report

[Publications] 山肩洋子, 河原達也, 奥乃博, 美濃導彦: "音声対話システムにおける物体指示のための信念ネットワークを用いた曖昧性の解消"人工知能学会誌. 19・1F. 47-56 (2004)

Related Report

[Publications] 北原鉄朗, 後藤真孝, 奥乃博: "音高による音色変化に着目した楽器音の音源同定:F0依存多次元正規分布に基づく識別手法"情報処理学会論文誌. 44・10. 2448-2458 (2004)

Related Report

[Publications] 中臺一博, 日台健一, 奥乃博, 溝口博, 北野宏明: "ヒューマノイドを対象にした視聴覚統合による実時間人物追跡:アクティブオーディションと顔認識の統合"ロボット学会誌. 21・5. 517-525 (2003)

Related Report

[Publications] 駒谷和範, 鹿島博晶, 田中克明, 河原達也: "複合的言語制約に基づくキーフレーズ検出を用いた汎用的なデータベース検索音声対話プラットフォーム"情報処理学会論文誌. 44・5. 1333-1342 (2003)

Related Report

[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai: "Active audition for humanoid robots that can listen to three simultaneous talkers"Journal of the Acoustical Society of America. 113・4,Pt2. 2230-2230 (2003)

Related Report

[Publications] 奥乃博, 中臺一博: "ロボット聴覚の課題と現状"情報処理. 44・11. 1138-1144 (2003)

Related Report

[Publications] Kazuhiro Nakadai, D.Matsuura, Hiroshi G.Okuno, H.Kitano: "Applying Scattering Theory to Robot Audition System"Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2003). 1147-1152 (2003)

Related Report

[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai, Hiroaki Kitano: "Realizing Personality in Audio-Visually Triggered Non-verbal Behaviors"Proceedings of IEEE/RAS International Conference on Robots and Automation (ICRA-2003). 392-397 (2003)

Related Report

[Publications] Kazuhiro Nakadai, Hiroshi G.Okuno, Hiroaki Kitano: "Robot Recognizes Three Simultaneous Speech By Active Audition"Proceedings of IEEE/RAS International Conference on Robots and Automation (ICRA-2003). 398-403 (2003)

Related Report

[Publications] Kazuhiro Nakadai, D.Matsuura, Hiroshi G.Okuno, H.Kitano: "Improvement of Three Simultaneous Speech Recognition by Using AV Integration and Scattering Theory for Humanoid"Proceedings of Audio Visual Spoken Processing (AVSP-2003). 157-162 (2003)

Related Report

[Publications] Kazunori Komatani, S.Ueno, T.Kawahara, Hiroshi G.Okuno: "User Modeling in Spoken Dialogue Systems for Flexible Guidance Generation"Proceedings of the Eighth European Conference on Speech Communication and Technology (Eurospeech-2003). 745-748 (2003)

Related Report

Related Report

Related Report

[Publications] Tatsuya Kawabahara, Ryosuke Ito, Kazunori Komatani: "Spoken Dialogue System for Queries on Appliance Manuals using Hierarchical Confirmation Strategy"Proceedings of the Eighth European Conference on Speech Communication and Technology (Eurospeech-2003). 1701-1704 (2003)

Related Report

[Publications] Kazunori Komatani, S.Ueno, T.Kawahara, Hiroshi G.Okuno: "Flexible Guidance Generation using User Model in Spoken Dialogue Systems"Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). 256-263 (2003)

Related Report

[Publications] Tetsuro Kitahara, Masataka Goto, Hiroshi G.Okuno: "Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution"Proceedings of 2003 International Conference on Multimedia and Expo (ICME 2003). 405-409 (2003)

Related Report

[Publications] Tetsuro Kitahara, Masataka Goto, Hiroshi G.Okuno: "Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology"Developments in Applied Artificial Intelligence. LNAI 2718. 112-122 (2003)

Related Report

[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai, Hiroaki Kitano: "Design and Implementation of Personality of Humanoids in Human Humanoid Non-verbal Interaction"Developments in Applied Artificial Intelligence. LNAI 2718. 405-409 (2003)

Related Report

[Publications] Hiroshi G.Okuno, Kazuhiro Nakadai: "Real-time Sound Source Localization and Separation based on Active Audio-Visual Integration"Computational Methods in Neural Modeling. LNCS2686. 110-125 (2003)

Related Report

[Publications] Tetsuro Kitahara, Masataka Goto, Hiroshi G.Okuno: "Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution"Proceedings of 2003 International Conference on Acoustics, Speech and Signal Processing (ICASSP'2003). 421-424 (2003)

Related Report

Related Report

[Publications] Kazunori Komatani, R.Itoh, T.Kawahara, Hiroshi G.Okuno: "Recognition of Emotional States in Spoken Dialogue with a Robot"Proc.of 17^<th> International Conf.on Industrial and Engineering Applications of AI & Expert Systems (IEA/AIE-2004). (印刷中). (2004)

Related Report

[Publications] 奥乃博: "AI事典、第2版"共立出版. 544 (2003)

Related Report

奥乃博京都大学, 情報学研究科, 教授 (60318201)