2016 Fiscal Year Annual Research Report

A Unified Bayesian Approach to Simultaneous Speech Recognition for Mixture Signals

Research Project

Project/Area Number	15K12063
Research Institution	Kyoto University
Principal Investigator	吉井和佳京都大学, 情報学研究科, 講師 (20510001)
Co-Investigator(Kenkyū-buntansha)	糸山克寿京都大学, 情報学研究科, 助教 (60614451)
Project Period (FY)	2015-04-01 – 2017-03-31
Keywords	音声認識 / 音源分離 / ベイズモデル
Outline of Annual Research Achievements	本年度は、音源分離精度の本質的な向上を目的として、低ランク音源モデルとスパース重畳過程に基づくネスト型ベイズ混合・因子モデルを用いたマルチチャネル音源分離法について研究を行った。従来の音源分離では、音源モデルに対して低ランク性を仮定したうえで、因子モデルであるNMFを用いるものや、重畳過程における音源のスパース性を仮定したうえで、混合モデルであるLDAを用いるものなどがあった。提案法では、音源モデルと重畳過程を統合した音源分離を行うことで、高精度な分離が実現できる。また、因子モデルと混合モデルの関係性に着目し、音源モデルと重畳過程のそれぞれに対し、因子モデルと混合モデルによるモデル化を行うことで複数の分離法を提案した。本研究成果は、信号処理分野のトップカンファレンスであるICASSP 2017に採択され、IEEE/ACM TASLPには現在のところ条件付き採録となっている。一方、音声強調技術についても研究を行った。具体的には、マイクアレイや周囲の環境などの事前情報を用いなくても、入力音響信号である多チャネル振幅スペクトログラムを低ランク成分 (雑音) とスパース成分 (目的音声) とに高精度に分離することができる手法を開発した。本手法は本来、瓦礫内探査用レスキューロボットに搭載することを目的として開発されたが、一般的な場面における音声強調にも効果があることが判明し、実際に音声認識率の大幅な確認した。本研究成果は、信号処理分野の国際会議であるEUSIPCO 2017に採択され、英文ジャーナルJRMにも採録されている。これら一連の研究により、音声強調・分離・認識という一連のプロセスがすべて確率モデルに基づいて定式化さて、昨年度開発した、MCMCに基づく音響信号処理と音声認識の統合法をさらに改善することが可能となった。

Research Products
(11 results)

All 2017 2016

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (9 results) (of which Int'l Joint Research: 6 results)

[Journal Article] Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance2017
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Journal Title
  
  Journal of Robotics and Mechatronics
  
  Volume: Vol. 29, No. 1 Pages: 83-93
- DOI
  10.20965/jrm.2017.p0083
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Low-Latency and High-Quality Two-Stage Human-Voice-Enhancement System for a Hose-Shaped Rescue Robot2017
- Author(s)
  Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katustoshi Itoyama, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno
- Journal Title
  
  Journal of Robotics and Mechatronics
  
  Volume: Vol. 29, No. 1 Pages: 198-212
- DOI
  10.20965/jrm.2017.p0198
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Presentation] Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization2017
- Author(s)
  Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Place of Presentation
  New Orleans, USA
- Year and Date
  2017-03-05 – 2017-03-09
- Int'l Joint Research
[Presentation] マルチチャネル音源分離のための低ランク音源モデルとスパース重畳過程に基づくネスト型ベイズ混合・因子モデル2016
- Author(s)
  板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳, 河原達也
- Organizer
  電子情報通信学会第19回情報論的学習理論ワークショップ
- Place of Presentation
  京都大学
- Year and Date
  2016-11-15 – 2016-11-19
[Presentation] Sound-Based Online Localization for an In-Pipe Snake Robot2016
- Author(s)
  Yoshiaki Bando, Hiroki Suhara, Motoyasu Tanaka, Tetsushi Kamegawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Fumitoshi Matsuno, Hiroshi G. Okuno
- Organizer
  IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)
- Place of Presentation
  Lausanne, Switzerland
- Year and Date
  2016-10-23 – 2016-10-27
- Int'l Joint Research
[Presentation] Online Simultaneous Localization and Mapping of Multiple Sound Sources and Asynchronous Microphone Arrays2016
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Keisuke Nakamura, Kazuhiro Nakadai, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Place of Presentation
  Daejeon, Korea
- Year and Date
  2016-10-09 – 2016-10-14
- Int'l Joint Research
[Presentation] Student's t Multichannel Nonnegative Matrix Factorization for Blind Source Separation2016
- Author(s)
  Koichi Kitamura, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
- Place of Presentation
  Xian, China
- Year and Date
  2016-09-13 – 2016-09-16
- Int'l Joint Research
[Presentation] A Unified Bayesian Model of Time-Frequency Clustering and Low-Rank Approximation for Multi-Channel Source Separation2016
- Author(s)
  Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Place of Presentation
  Budapest, Hungary
- Year and Date
  2016-08-29 – 2016-09-02
- Int'l Joint Research
[Presentation] Variational Bayesian Multi-Channel Robust NMF for Human-Voice Enhancement with a Deformable and Partially-Occluded Microphone Array2016
- Author(s)
  Yoshiaki Bando, Katsuyoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Place of Presentation
  Budapest, Hungary
- Year and Date
  2016-08-29 – 2016-09-02
- Int'l Joint Research
[Presentation] マルチチャネル音源分離のためのネスト型基底・音源混合モデルに基づく時間周波数クラスタリング2016
- Author(s)
  板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳, 河原達也
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  京都大学
- Year and Date
  2016-08-24 – 2016-08-25
[Presentation] 変分ベイズ多チャネルロバストNMFに基づくマイクロホンの移動・被覆を許容する音声強調2016
- Author(s)
  坂東宜昭, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 河原達也, 奥乃博
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  京都大学
- Year and Date
  2016-08-24 – 2016-08-25

2016 Fiscal Year Annual Research Report

A Unified Bayesian Approach to Simultaneous Speech Recognition for Mixture Signals

Principal Investigator

吉井 和佳 京都大学, 情報学研究科, 講師 (20510001)

Research Products

[Journal Article] Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance2017

Author(s)

Journal Title

DOI

[Journal Article] Low-Latency and High-Quality Two-Stage Human-Voice-Enhancement System for a Hose-Shaped Rescue Robot2017

Author(s)

Journal Title

DOI

[Presentation] Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization2017

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] マルチチャネル音源分離のための低ランク音源モデルとスパース重畳過程に基づくネスト型ベイズ混合・因子モデル2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Sound-Based Online Localization for an In-Pipe Snake Robot2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Online Simultaneous Localization and Mapping of Multiple Sound Sources and Asynchronous Microphone Arrays2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Student's t Multichannel Nonnegative Matrix Factorization for Blind Source Separation2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] A Unified Bayesian Model of Time-Frequency Clustering and Low-Rank Approximation for Multi-Channel Source Separation2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Variational Bayesian Multi-Channel Robust NMF for Human-Voice Enhancement with a Deformable and Partially-Occluded Microphone Array2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] マルチチャネル音源分離のためのネスト型基底・音源混合モデルに基づく時間周波数クラスタリング2016

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 変分ベイズ多チャネルロバストNMFに基づくマイクロホンの移動・被覆を許容する音声強調2016

Author(s)

Organizer

Place of Presentation

Year and Date

吉井和佳京都大学, 情報学研究科, 講師 (20510001)