A Unified Bayesian Approach to Simultaneous Speech Recognition for Mixture Signals

Research Project

Project/Area Number	15K12063
Research Category	Grant-in-Aid for Challenging Exploratory Research
Allocation Type	Multi-year Fund
Research Field	Perceptual information processing
Research Institution	Kyoto University
Principal Investigator	Yoshii Kazuyoshi 京都大学, 情報学研究科, 講師 (20510001)
Co-Investigator(Kenkyū-buntansha)	糸山克寿京都大学, 情報学研究科, 助教 (60614451)
Co-Investigator(Renkei-kenkyūsha)	KAWAHARA Tatsuya 京都大学, 大学院情報学研究科, 教授 (00234104) MOCHIHASHI Daichi 統計数理研究所, モデリング研究系, 准教授 (80418508)
Project Period (FY)	2015-04-01 – 2017-03-31
Project Status	Completed (Fiscal Year 2016)
Budget Amount *help	¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000) Fiscal Year 2016: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000) Fiscal Year 2015: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	音源分離 / 音声認識 / 確率モデル / ベイズモデル / MCMC
Outline of Final Research Achievements	We proposed a method that can simultaneously recognize multiple utterances by using a probabilictic model of source separation. Since there is uncertainty about source signals, we combined speech recognition with source separation by considering the posterior distributin of the source signals. This enabled us to obtain recognition results directly from mixture signals without uniquely determining the source signals. In addition, we proposed a source separation method based on an integrated model involving a source model and a superimposition model. Each model is represented as a mixture (LDA) or factor model (NMF) and the performance of each combination was evaluated.

Report

(3 results)

2016 Annual Research Report Final Research Report ( PDF )
2015 Research-status Report

Research Products
(13 results)

All 2017 2016 2015

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (11 results) (of which Int'l Joint Research: 7 results)

[Journal Article] Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance2017
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Journal Title
  
  Journal of Robotics and Mechatronics
  
  Volume: 29 Issue: 1 Pages: 83-93
- DOI
  10.20965/jrm.2017.p0083
- NAID
  130007519901
- ISSN
  0915-3942, 1883-8049
- Year and Date
  2017-02-20
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Low Latency and High Quality Two-Stage Human-Voice-Enhancement System for a Hose-Shaped Rescue Robot2017
- Author(s)
  Yoshiaki Bando, Hiroshi Saruwatari, Nobutaka Ono, Shoji Makino, Katustoshi Itoyama1, Daichi Kitamura, Masaru Ishimura, Moe Takakusaki, Narumi Mae, Kouei Yamaoka, Yutaro Matsui, Yuichi Ambe, Masashi Konyo, Satoshi Tadokoro, Kazuyoshi Yoshii, Hiroshi G. Okuno
- Journal Title
  
  Journal of Robotics and Mechatronics
  
  Volume: 29 Issue: 1 Pages: 198-212
- DOI
  10.20965/jrm.2017.p0198
- NAID
  130007519848
- ISSN
  0915-3942, 1883-8049
- Year and Date
  2017-02-20
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Presentation] Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization2017
- Author(s)
  Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara
- Organizer
  IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Place of Presentation
  New Orleans, USA
- Year and Date
  2017-03-05
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] マルチチャネル音源分離のための低ランク音源モデルとスパース重畳過程に基づくネスト型ベイズ混合・因子モデル2016
- Author(s)
  板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳, 河原達也
- Organizer
  電子情報通信学会第19回情報論的学習理論ワークショップ
- Place of Presentation
  京都大学
- Year and Date
  2016-11-15
- Related Report
  2016 Annual Research Report
[Presentation] Sound-Based Online Localization for an In-Pipe Snake Robot2016
- Author(s)
  Yoshiaki Bando, Hiroki Suhara, Motoyasu Tanaka, Tetsushi Kamegawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Fumitoshi Matsuno, Hiroshi G. Okuno
- Organizer
  IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR)
- Place of Presentation
  Lausanne, Switzerland
- Year and Date
  2016-10-23
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Online Simultaneous Localization and Mapping of Multiple Sound Sources and Asynchronous Microphone Arrays2016
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Keisuke Nakamura, Kazuhiro Nakadai, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- Place of Presentation
  Daejeon, Korea
- Year and Date
  2016-10-09
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Student's t Multichannel Nonnegative Matrix Factorization for Blind Source Separation2016
- Author(s)
  Koichi Kitamura, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
- Place of Presentation
  Xian, China
- Year and Date
  2016-09-13
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] A Unified Bayesian Model of Time-Frequency Clustering and Low-Rank Approximation for Multi-Channel Source Separation2016
- Author(s)
  Kousuke Itakura, Yoshiaki Bando, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Place of Presentation
  Budapest, Hungary
- Year and Date
  2016-08-29
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] Variational Bayesian Multi-Channel Robust NMF for Human-Voice Enhancement with a Deformable and Partially-Occluded Microphone Array2016
- Author(s)
  Yoshiaki Bando, Katsuyoshi Itoyama, Masashi Konyo, Satoshi Tadokoro, Kazuhiro Nakadai, Kazuyoshi Yoshii, Hiroshi G. Okuno
- Organizer
  European Signal Processing Conference (EUSIPCO)
- Place of Presentation
  Budapest, Hungary
- Year and Date
  2016-08-29
- Related Report
  2016 Annual Research Report
- Int'l Joint Research
[Presentation] マルチチャネル音源分離のためのネスト型基底・音源混合モデルに基づく時間周波数クラスタリング2016
- Author(s)
  板倉光佑, 坂東宜昭, 中村栄太, 糸山克寿, 吉井和佳, 河原達也
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  京都大学
- Year and Date
  2016-08-24
- Related Report
  2016 Annual Research Report
[Presentation] 変分ベイズ多チャネルロバストNMFに基づくマイクロホンの移動・被覆を許容する音声強調2016
- Author(s)
  坂東宜昭, 糸山克寿, 昆陽雅司, 田所諭, 中臺一博, 吉井和佳, 河原達也, 奥乃博
- Organizer
  電子情報通信学会音声研究会
- Place of Presentation
  京都大学
- Year and Date
  2016-08-24
- Related Report
  2016 Annual Research Report
[Presentation] 音源分離のためのベイズモデルに基づく音源信号の不確実性を考慮した音声認識2015
- Author(s)
  板倉光佑, 坂東宜昭, 糸山克寿, 吉井和佳
- Organizer
  日本音響学会 2015年秋季研究発表会
- Place of Presentation
  会津大学
- Year and Date
  2015-09-16
- Related Report
  2015 Research-status Report
[Presentation] Bayesian Integration of Sound Source Separation and Speech Recognition: A New Approach to Simultaneous Speech Recognition2015
- Author(s)
  Kousuke Itakura, Izaya Nishimuta, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii
- Organizer
  Interspeech
- Place of Presentation
  Dresden, Germany
- Year and Date
  2015-09-06
- Related Report
  2015 Research-status Report
- Int'l Joint Research

A Unified Bayesian Approach to Simultaneous Speech Recognition for Mixture Signals

Principal Investigator

Yoshii Kazuyoshi 京都大学, 情報学研究科, 講師 (20510001)

¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)

Report

Research Products

[Journal Article] Layout Optimization of Cooperative Distributed Microphone Arrays Based on Estimation of Source Separation Performance2017

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Low Latency and High Quality Two-Stage Human-Voice-Enhancement System for a Hose-Shaped Rescue Robot2017

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Presentation] Bayesian Multichannel Nonnegative Matrix Factorization for Audio Source Separation and Localization2017

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] マルチチャネル音源分離のための低ランク音源モデルとスパース重畳過程に基づくネスト型ベイズ混合・因子モデル2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Sound-Based Online Localization for an In-Pipe Snake Robot2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Online Simultaneous Localization and Mapping of Multiple Sound Sources and Asynchronous Microphone Arrays2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Student's t Multichannel Nonnegative Matrix Factorization for Blind Source Separation2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] A Unified Bayesian Model of Time-Frequency Clustering and Low-Rank Approximation for Multi-Channel Source Separation2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Variational Bayesian Multi-Channel Robust NMF for Human-Voice Enhancement with a Deformable and Partially-Occluded Microphone Array2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] マルチチャネル音源分離のためのネスト型基底・音源混合モデルに基づく時間周波数クラスタリング2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 変分ベイズ多チャネルロバストNMFに基づくマイクロホンの移動・被覆を許容する音声強調2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音源分離のためのベイズモデルに基づく音源信号の不確実性を考慮した音声認識2015

Author(s)

Organizer

Place of Presentation