共演者音楽ロボット実現のための音響信号に基づく音楽インタラクション手法の開発

Research Project

Project/Area Number	11J06577
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Perception information processing/Intelligent robotics
Research Institution	Kyoto University
Principal Investigator	大塚琢馬京都大学, 情報学研究科, 特別研究員(DC1)
Project Period (FY)	2011 – 2014-03-31
Project Status	Completed (Fiscal Year 2013)
Budget Amount *help	¥1,900,000 (Direct Cost: ¥1,900,000) Fiscal Year 2013: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 2012: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 2011: ¥700,000 (Direct Cost: ¥700,000)
Keywords	マクロホンアレイ / ノンパラメトリックベイズ / ロボット聴覚 / 音環境理解 / マイクロホンアレイ / ベイズモデル / 音源定位・分離 / 残響除去 / 楽譜追従 / ベイズトピックモデル
Research Abstract	本研究では, 共演者音楽ロボットをはじめ, 音を聴くロボットに対して必須の技術である, 様々な音の聴き分け技術を, マイクロフォンアレイを通じて実現する. 従来のマイクロフォンアレイ処理は, 入力混合音や音源が存在する環境に対して様々な仮定や制約を設ける手法が多かった. 例えば, 入力混合音に含まれる音源の数を既知とすることや, 環境中の壁や床での音の反射に由来する残響に関するパラメータを既知とする場合があった. 本研究では, ロボットが音を聴く環境に関する未知要因を柔軟に扱うため, ベイズ統計モデルに基づく確率的なマイクロフォンアレイ処理の定式化を行い, 音源数や残響の量が未知である場合でも, 状況に応じたパラメータチューニングなどが不要な手法を開発した. 具体的には, ノンパラメトリックベイズモデルを適用することで, (1)音源数が未知という課題に対しては, 音源数に応じたモデルの複雑さの選択を回避し, (2)残響の量に応じて自己回帰モデルの次数を手動でチューニングする必要を除いた. 本研究の貢献は, マイクロフォンアレイ処理でよく実現される3つの機能(a)音源分離, (b)音源定位, (c)残響除去をノンパラメトリックベイズに基づく統一モデルとして定式化し, その有効性を示した点である. これらの成果のうち, 混合音の分離処理に関する手法は査読付き英文論文誌に発表された. さらに, 本手法を残響除去が可能なモデルへと拡張した成果も英文論文誌に投稿, 現在査読中である. また, これら一連の成果は博士論文にまとめた.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason ロボットは多様な音環境においても頑健に音を聴き分け, 理解するという目的に対し, 頑健な音の聴き分け手法の開発を行うことができた.
Strategy for Future Research Activity	混合音から分離抽出(聴き分け)した音に対し, 音高抽出や音声認識, 音源同定などの音環境理解へと発展させることが重要である. 従来の音に関するパターン認識問題では, 音響特徴量は音源分離などの歪みを伴う処理に対して脆弱であった, その点への対処が今後の大きな課題である.

Report

(3 results)

Research Products
(18 results)

All 2014 2013 2012 2011 Other

All Journal Article (6 results) (of which Peer Reviewed: 6 results) Presentation (10 results) Remarks (1 results) Patent(Industrial Property Rights) (1 results)

[Journal Article] Bayesian Nonparametries for Microphone Array Processing2014
- Author(s)
  T. Otsuka, et al.
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech and Language Processing
  
  Volume: 22 : 2 Issue: 2 Pages: 493-504
- DOI
  10.1109/taslp.2013.2294582
- Related Report
  2013 Annual Research Report
- Peer Reviewed
[Journal Article] Spatio-Temporal Dynamics in Collective Frog Choruses Examined by Mathematical Modeling and Field Observation2014
- Author(s)
  I. Aihara, et al.
- Journal Title
  
  Scientific Reports
  
  Volume: 4 : 3891 Issue: 1 Pages: 3891-3891
- DOI
  10.1038/srep03891
- Related Report
  2013 Annual Research Report
- Peer Reviewed
[Journal Article] Nonparametrie Bayesina Sparse Factor Analysis for Frequency Doain Blind Source Separation without Pearmuat ion Ambisuity2013
- Author(s)
  Kohei Nagira, Takuma Otsuka, Hiroshi G. Okuno
- Journal Title
  
  EURASIP Journal on Audio, Speech, and Music Processing
  
  Volume: 2013(4) Issue: 1 Pages: 14-14
- DOI
  10.1186/1687-4722-2013-4
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] Infinite Sparse Factor Analys for Blind Source Separation in Reverberant Environments2012
- Author(s)
  Kohei Nagira, Takuma Otsuka, Tetsuya Ogata, Hiroshi G. Okuno
- Journal Title
  
  Lecture Notes in Computer Science
  
  Volume: 7,626 Pages: 638-647
- DOI
  10.1007/978-3-642-34166-3_70
- ISBN
  9783642341656, 9783642341663
- Related Report
  2012 Annual Research Report
- Peer Reviewed
[Journal Article] A multi-modal tempo and beat tracking system based on audio-visual information from live guitar performances2012
- Author(s)
  T.Itohara, T.Otsuka, T.Mizumoto, et al
- Journal Title
  
  EURASIP Journal on Audio, Speech, and Music Processing
  
  Volume: 2012:6 Issue: 1 Pages: 1-15
- DOI
  10.1186/1687-4722-2012-6
- Related Report
  2011 Annual Research Report
- Peer Reviewed
[Journal Article] 音楽共演ロボット:開始・終了キューの画像認識による人間のフルート奏者との実時間同期2011
- Author(s)
  リムアンジェリカ, 水本武志, 大塚琢馬, ら
- Journal Title
  
  情報処理学会論文誌
  
  Volume: Vol.52, No.12 Pages: 3599-3610
- NAID
  110008719935
- Related Report
  2011 Annual Research Report
- Peer Reviewed
[Presentation] Solving Google's Continuous Audio CAPTCHA with HMM-based Automa tic Speech Recognition2013
- Author(s)
  S. Sano, T. Otsuka, et al.
- Organizer
  the 8th International Workshop on Security (IWSEC 2013)
- Place of Presentation
  Okinawa
- Year and Date
  2013-11-18
- Related Report
  2013 Annual Research Report
[Presentation] マイクロホンアレイを用いた音源定位・分離の統一的ノンパラメトリックベイズモデル2012
- Author(s)
  大塚琢馬, ら
- Organizer
  第27回信号処理シンポジウム
- Place of Presentation
  ANAインターコンチネンタル石垣リゾート(沖縄県)
- Year and Date
  2012-11-29
- Related Report
  2012 Annual Research Report
[Presentation] Unified Auditory Functions based on Bayesian Topic Model2012
- Author(s)
  Takuma Otsuka, et al.
- Organizer
  Proc. of International Conference on Intelligent Robots and Systems (IROS-2012)
- Place of Presentation
  Vilamoura (ポルトガル)
- Year and Date
  2012-10-09
- Related Report
  2012 Annual Research Report
[Presentation] Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution2012
- Author(s)
  Takuma Otsuka, et al.
- Organizer
  Proc. of the Twenty-Sixth AMI Conference on Artificial Intelligence (AAAI-12)
- Place of Presentation
  Toronto (カナダ)
- Year and Date
  2012-07-26
- Related Report
  2012 Annual Research Report
[Presentation] 音源定位手法MUSICのベイズ拡張2011
- Author(s)
  大塚琢馬, ら
- Organizer
  第34回AIチャレンジ研究会
- Place of Presentation
  慶應義塾大学(東京都)
- Year and Date
  2011-12-15
- Related Report
  2011 Annual Research Report
[Presentation] An interactive musical ensemble with the NAO Thereminist2011
- Author(s)
  Angelica Lim, 水本武志, 大塚琢馬, ら
- Organizer
  第34回AIチャレンジ研究会
- Place of Presentation
  慶應義塾大学(東京都)
- Year and Date
  2011-12-15
- Related Report
  2011 Annual Research Report
[Presentation] Incremental Bayesian Audio-to-Score Alignment with Flexible Harmonic Structure Models2011
- Author(s)
  T.Otsuka, et al
- Organizer
  12th International Society for Musical Information Retrieval Conference
- Place of Presentation
  マイアミ(アメリカ)
- Year and Date
  2011-10-26
- Related Report
  2011 Annual Research Report
[Presentation] Particle-filter Based Audio-visual Beat-tracking for Music Robot Ensemble with Human Guitarist2011
- Author(s)
  T.Itohara, T.Mizumoto, T.Otsuka, et al
- Organizer
  IEEE/RSJ International Conference on Intelligent Robots and Systems
- Place of Presentation
  サンフランシスコ(アメリカ)
- Year and Date
  2011-09-26
- Related Report
  2011 Annual Research Report
[Presentation] MUSIC法を用いた音源定位のベイズ拡張2011
- Author(s)
  大塚琢馬, ら
- Organizer
  日本ロボット学会第29回学術講演会
- Place of Presentation
  芝浦工業大学(東京都)
- Year and Date
  2011-09-09
- Related Report
  2011 Annual Research Report
[Presentation] Bayesian Extension of MUSIC for Sound Source Iocalization and Tracking2011
- Author(s)
  T.Otsuka, et al
- Organizer
  International Conference on Spoken Language Processing (INTERSPEECH)
- Place of Presentation
  フィレンツェ(イタリア)
- Year and Date
  2011-08-30
- Related Report
  2011 Annual Research Report
[Remarks]
- URL
  http://winnie.kuis.kyoto-u.ac.jp/members/ohtsuka/research_demo_jp.html
- Related Report
  2013 Annual Research Report
[Patent(Industrial Property Rights)] ロボット、ロボット制御方法およびプログラム2011
- Inventor(s)
  中臺一博, 大塚琢馬, ら
- Industrial Property Rights Holder
  本田技研工業(株)
- Patent Publication Number
  2011-180590
- Filing Date
  2011-09-15
- Related Report
  2011 Annual Research Report

共演者音楽ロボット実現のための音響信号に基づく音楽インタラクション手法の開発

Principal Investigator

大塚 琢馬 京都大学, 情報学研究科, 特別研究員(DC1)

¥1,900,000 (Direct Cost: ¥1,900,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Bayesian Nonparametries for Microphone Array Processing2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Spatio-Temporal Dynamics in Collective Frog Choruses Examined by Mathematical Modeling and Field Observation2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Nonparametrie Bayesina Sparse Factor Analysis for Frequency Doain Blind Source Separation without Pearmuat ion Ambisuity2013

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Infinite Sparse Factor Analys for Blind Source Separation in Reverberant Environments2012

Author(s)

Journal Title

DOI

ISBN

Related Report

[Journal Article] A multi-modal tempo and beat tracking system based on audio-visual information from live guitar performances2012

Author(s)

Journal Title

DOI

Related Report

[Journal Article] 音楽共演ロボット:開始・終了キューの画像認識による人間のフルート奏者との実時間同期2011

Author(s)

Journal Title

NAID

Related Report

[Presentation] Solving Google's Continuous Audio CAPTCHA with HMM-based Automa tic Speech Recognition2013

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] マイクロホンアレイを用いた音源定位・分離の統一的ノンパラメトリックベイズモデル2012

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Unified Auditory Functions based on Bayesian Topic Model2012

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution2012

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音源定位手法MUSICのベイズ拡張2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] An interactive musical ensemble with the NAO Thereminist2011

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Incremental Bayesian Audio-to-Score Alignment with Flexible Harmonic Structure Models2011

Author(s)

Organizer

Place of Presentation

Year and Date

大塚琢馬京都大学, 情報学研究科, 特別研究員(DC1)