2012 Fiscal Year Final Research Report

Computational Auditory Scene Analysis Using Active Audio-Visual Integration in a Dynamically Changing Environment

Research Project

Project/Area Number	22700165
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	Perception information processing/Intelligent robotics
Research Institution	Tokyo Institute of Technology
Principal Investigator	NAKADAI Kazuhiro 東京工業大学, 大学院・情報理工学研究科, 講師 (70436715)
Project Period (FY)	2010 – 2012
Keywords	センサ融合 / 統合(ロボット聴覚,アクティブ視聴覚統合,アクティブ聴覚,視聴覚音声認識,視聴覚発話区間検出)
Research Abstract	A framework for Audio-Visual Integration (AVI), which can provide optimal integration according to quality of audio and visual information obtained from a robot’s camera and microphone, was proposed and implemented. In addition, the proposed framework was extended by proposing “Active Audio Visual Integration (AAVI)”, which improves the quality of audio and visual information using active robot ’ s motion. Preliminary experiments on automatic speech recognition and voice activity detection showed that the AAVI framework worked effectively even in visually and/or auditorily noisy conditions.

Research Products
(27 results)

All 2013 2012 2011 2010 Other

All Journal Article (8 results) (of which Peer Reviewed: 8 results) Presentation (17 results) Remarks (2 results)

[Journal Article] クワドロコプター搭載のマイクロホンアレイを用いた屋外音環境理解の逐次雑音推定による向上2013
- Author(s)
  奥谷啓太, 吉田尚水, 中村圭佑, 中臺一博
- Volume
  31(掲載決定)
- Pages
  7-8
- Peer Reviewed
[Journal Article] Audio-Visual Voice Activity Detection Based on an Utterance State Transition Model2012
- Author(s)
  K. Nakadai, T. Yoshida
- Journal Title
  
  Advanced Robotics
  
  Volume: 26(10) Pages: 1183-1201
- DOI
  DOI:10.1080/01691864.2012.687152
- Peer Reviewed
[Journal Article] SLAM-based Online Calibration for Asynchronous Microphone Array2012
- Author(s)
  H. Miura, T. Yoshida, K. Nakamura, K.Nakadai
- Journal Title
  
  Advanced Robotics
  
  Volume: 26(17) Pages: 1941-1965
- DOI
  DOI:10.1080/01691864.2012.728690
- Peer Reviewed
[Journal Article] Whole Body Motion Noise Cancellation of a Robot for Improved Automatic SpeechRecognition2011
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, H.Tsujino, J. Imura
- Journal Title
  
  Advanced Robotics
  
  Volume: 25 Pages: 1405-1426
- DOI
  DOI:10.1163/016918611X579448
- Peer Reviewed
[Journal Article] Ego NoiseCancellation of a Robot using MissingFeature Masks2011
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, H.Tsujino, J. Imura
- Journal Title
  
  Applied Intelligence
  
  Volume: 34 Pages: 360-371
- DOI
  DOI:10.1007/s10489-011-0285-0
- Peer Reviewed
[Journal Article] ロボット聴覚のための2階層視聴覚情報統合を用いた音声認識システムの検討2010
- Author(s)
  吉田尚水, 中臺一博, 奥乃博
- Journal Title
  
  日本ロボット学会誌
  
  Volume: 28 Pages: 56-63
- URL
  https://www.jstage.jst.go.jp/article/jrsj/28/8/28_8_970/_pdf
- Peer Reviewed
[Journal Article] Robust Ego Noise Suppression of a Robot2010
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, H.Tsujino, J. Imura
- Journal Title
  
  Trends in Applied Intelligent Systems,Lecture Notes in Computer Science
  
  Volume: 6096/2010 Pages: 62-71
- DOI
  DOI:10.1007/978-3-642-13022-9_7
- Peer Reviewed
[Journal Article] An Improvement in Audio-Visual Voice Activity Detection for AutomaticSpeech Recognition2010
- Author(s)
  T. Yoshida, K. Nakadai, H. G. Okuno
- Journal Title
  
  Trends in Applied Intelligent Systems, Lecture Notes in Computer Science
  
  Volume: 6096/2010 Pages: 51-61
- DOI
  DOI:10.1007/978-3-642-13022-9_6
- Peer Reviewed
[Presentation] Active Audio-Visual Integration for Robots2013
- Author(s)
  K. Nakadai, T. Yoshida
- Organizer
  The 2nd Symposium on Binaural Active Audition for Humanoid Robots (BINAAHR)
- Place of Presentation
  京都
- Year and Date
  2013-03-18
[Presentation] Active Audio-Visual Integration for Voice Activity Detection based on a CausalBayesian Network2012
- Author(s)
  T. Yoshida, K. Nakadai
- Organizer
  IEEE-RASInternational Conference on HumanoidRobots (Humanoids 2012)
- Place of Presentation
  大阪
- Year and Date
  20121129-1215
[Presentation] Improvement of Audio-Visual Score Following in Robot Ensemble with Human Guitarist2012
- Author(s)
  T. Itohara, K. Nakadai, T. Ogata, H.G.Okuno
- Organizer
  IEEE-RASInternational Conference on HumanoidRobots(Humanoids 2012)
- Place of Presentation
  大阪
- Year and Date
  20121129-1201
[Presentation] Live Assessment of Beat Tracking for Robot Audition2012
- Author(s)
  J. L. Oliveira, G. Ince, K. Na kamura, K. Nakadai, H.G. Okuno, L. P. Reis, F. Gouyon
- Organizer
  IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS-2012)
- Place of Presentation
  ビラモウラ(ポルトガル)
- Year and Date
  20121007-12
[Presentation] ロボット聴覚のための因果モデルを用いたアクティブ視聴覚統合発話区間検出の検討2012
- Author(s)
  吉田尚水,中臺一博
- Organizer
  第30回日本ロボット学会学術講演会
- Place of Presentation
  札幌
- Year and Date
  20120917-20
[Presentation] アクティブ視聴覚統合による発話区間検出の検討:因果モデルベースアプローチ2012
- Author(s)
  吉田尚水,中臺一博
- Organizer
  人工知能学会第36回AI-Challenge研究会
- Place of Presentation
  東京
- Year and Date
  2012-11-15
[Presentation] Audio-VisualIntegration for voice activity detection2012
- Author(s)
  T. Yoshida, K. Nakadai
- Organizer
  First Symposium on Binaural Active Audition for Humanoid Robots
- Place of Presentation
  パリ(フランス)
- Year and Date
  2012-02-27
[Presentation] Incremental Learning for Ego Noise Estimation of a Robot2011
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, J.Imura, K. Nakamura, H. Nakajima
- Organizer
  IEEE/RSJInternational Conference onIntelligent Robots and Systems (IROS2011)
- Place of Presentation
  サンフランシスコ(アメリカ)
- Year and Date
  20110926-27
[Presentation] Assessment of Single-channel Ego Noise Estimation Methods2011
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, J.Imura, K. Nakamura, H. Nakajima
- Organizer
  IEEE/RSJInternational Conference onIntelligent Robots and Systems (IROS2011)
- Place of Presentation
  サンフランシスコ(アメリカ)
- Year and Date
  20110926-27
[Presentation] Multi-talker Speech Recognition under Ego-motion Noise using Missing Feature Theory2011
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, H.Tsujino, J. Imura
- Organizer
  IEEE/RSJInternational Conference onIntelligent Robots and Systems (IROS2010)
- Place of Presentation
  台北(台湾)
- Year and Date
  2011-10-19
[Presentation] ロボットのための情報量レベルに基づくアクティブ視聴覚統合の検討2011
- Author(s)
  吉田尚水, 中村圭佑, 中臺一博
- Organizer
  第29回日本ロボット学会学術講演会,日本ロボット学会
- Place of Presentation
  東京
- Year and Date
  2011-09-09
[Presentation] Assessment of General Applicability of Ego Noise Estimation-Applications toAutomatic Speech Recognition and Sound Source Localization2011
- Author(s)
  G. Ince, K. Nakamura, F. Asano, H.Nakajima, K. Nakadai
- Organizer
  IEEE-RAS International Conference on Roboticsand Automation (ICRA 2011)
- Place of Presentation
  (上海)中国
- Year and Date
  2011-05-11
[Presentation] ロボットによる音声発話区間検出のためのハイブリッドダイナミカルシステムに基づくモダリティ選択の検討2010
- Author(s)
  吉田尚水, 中臺一博
- Organizer
  第11回計測自動制御学会システムインテグレーション部門講演会
- Place of Presentation
  仙台
- Year and Date
  2010-12-23
[Presentation] Two-Layered Audio-Visual Speech Recognition for Robots in NoisyEnvironments2010
- Author(s)
  T. Yoshida, K. Nakadai, H.G. Okuno
- Organizer
  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010)
- Place of Presentation
  台北(台湾)
- Year and Date
  2010-10-19
[Presentation] Audio-visual speech recognition system for a robot2010
- Author(s)
  T. Yoshida, K. Nakadai
- Organizer
  International Conference on Auditory-Visual Speech Processing (AVSP 2010)
- Place of Presentation
  箱根
- Year and Date
  2010-10-01
[Presentation] A Robust Speech Recognition System against the Ego Noise of a Robot2010
- Author(s)
  G. Ince, K. Nakadai, T. Rodemann, H.Tsujino, J. Imura
- Organizer
  InternationalConference on Spoken LanguageProcessing (Interspeech 2010)
- Place of Presentation
  千葉
- Year and Date
  2010-09-29
[Presentation] Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots2010
- Author(s)
  T. Yoshida, K. Nakadai
- Organizer
  International Conference on Spoken Language Processing (Interspeech2010)
- Place of Presentation
  千葉
- Year and Date
  2010-09-29
[Remarks] HARKのページロボット聴覚オープンソースソフトウェア
- URL
  http://winnie.kuis.kyoto-u.ac.jp/
[Remarks] 東京工業大学中臺研究室HP
- URL
  http://www.cyb.mei.titech.ac.jp/nakadai

2012 Fiscal Year Final Research Report

Computational Auditory Scene Analysis Using Active Audio-Visual Integration in a Dynamically Changing Environment

Principal Investigator

NAKADAI Kazuhiro 東京工業大学, 大学院・情報理工学研究科, 講師 (70436715)

Research Products

[Journal Article] クワドロコプター搭載のマイクロ ホンアレイを用いた屋外音環境理解の逐次雑音推定による向上2013

Author(s)

Volume

Pages

[Journal Article] Audio-Visual Voice Activity Detection Based on an Utterance State Transition Model2012

Author(s)

Journal Title

DOI

[Journal Article] SLAM-based Online Calibration for Asynchronous Microphone Array2012

Author(s)

Journal Title

DOI

[Journal Article] Whole Body Motion Noise Cancellation of a Robot for Improved Automatic SpeechRecognition2011

Author(s)

Journal Title

DOI

[Journal Article] Ego NoiseCancellation of a Robot using MissingFeature Masks2011

Author(s)

Journal Title

DOI

[Journal Article] ロボット聴覚のための2階層視聴覚情報統合を用いた音声認識システムの検討2010

Author(s)

Journal Title

URL

[Journal Article] Robust Ego Noise Suppression of a Robot2010

Author(s)

Journal Title

DOI

[Journal Article] An Improvement in Audio-Visual Voice Activity Detection for AutomaticSpeech Recognition2010

Author(s)

Journal Title

DOI

[Presentation] Active Audio-Visual Integration for Robots2013

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Active Audio-Visual Integration for Voice Activity Detection based on a CausalBayesian Network2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Improvement of Audio-Visual Score Following in Robot Ensemble with Human Guitarist2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Live Assessment of Beat Tracking for Robot Audition2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] ロボット聴覚のための因果モデルを用いたアクティブ視聴覚統合発話区間検出の検討2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] アクティブ視聴覚統合による発話区間検出の検討:因果モデルベースアプローチ2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Audio-VisualIntegration for voice activity detection2012

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Incremental Learning for Ego Noise Estimation of a Robot2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Assessment of Single-channel Ego Noise Estimation Methods2011

Author(s)

Organizer

[Journal Article] クワドロコプター搭載のマイクロホンアレイを用いた屋外音環境理解の逐次雑音推定による向上2013

[Presentation] ロボットによる音声発話区間検出のためのハイブリッドダイナミカルシステムに基づくモダリティ選択の検討2010