話者・言語特徴の分離表現に基づく音声強調・認識の統合

Research Project

Project/Area Number	20H01159
Research Category	Grant-in-Aid for Encouragement of Scientists
Allocation Type	Single-year Grants
Review Section	4110:Information science, computer engineering, human informaticsand, applied informatics related fields
Research Institution	Institute of Physical and Chemical Research
Principal Investigator	Sekiguchi Kouhei 国立研究開発法人理化学研究所, 革新知能統合研究センター, テクニカルスタッフ1
Project Period (FY)	2020-04-01 –
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥480,000 (Direct Cost: ¥480,000) Fiscal Year 2020: ¥480,000 (Direct Cost: ¥480,000)
Keywords	音声強調 / 音源分離 / 音声認識
Outline of Research at the Start	雑音存在下で頑健な音声認識を行うために、多チャネルマイクで音声を録音し、対象の音声の強調と音声認識を順番に行うアプローチが用いられるが、前段の音声強調が失敗した場合に認識精度が大きく低下してしまう問題がある。本研究では音声を言語情報と話者情報に分離する技術を開発し、それを用いて音声認識と多チャネル音声強調を統合することで、音声強調と認識の精度を同時に改善する。
Outline of Final Research Achievements	音声認識を用いるシステムにおいて、マイクと話者の距離が離れている場合には周囲の雑音や残響などもマイクでの観測音に含まれてしまうために音声認識が困難になる問題がある。そのため、観測音から音声のみを取り出す研究が盛んにおこなわれている。本研究では、使用する環境を特定しない汎用的な音声強調手法に深層学習を用いた音声の生成モデルを統合した従来手法に着目し、音声が時不変な話者情報と時変な言語情報に依存するという性質を考慮した音声の生成モデルを用いることで、音声強調精度のさらなる改善を図った。
Academic Significance and Societal Importance of the Research Achievements	音声認識は、スマートフォンなどのように話者とマイクの距離が近い場合には、現在すでに高い認識率を達成しているものの、話者とマイクの距離が離れている場合には周囲の影響により認識率は大幅に低下してしまう。このような状況における認識率を改善することができれば、スマートスピーカなどを快適に利用できるようになったり、聴覚障害者の日常生活を補助するようなデバイスを実現することが可能になったりすると考えられるため、音声強調は重要な研究テーマである。

Report

(2 results)

2020 Annual Research Report Final Research Report ( PDF )

Research Products
(3 results)

All 2021 2020

All Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results, Open Access: 1 results) Presentation (2 results) (of which Int'l Joint Research: 1 results)

[Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020
- Author(s)
  Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara
- Journal Title
  
  IEEE/ACM Transactions on Audio, Speech, and Language Processing
  
  Volume: 28 Pages: 2610-2625
- DOI
  10.1109/taslp.2020.3019181
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Semi-supervised Multichannel Speech Separation Based on a Phone- and Speaker-Aware Deep Generative Model of Speech Spectrograms2021
- Author(s)
  Yicheng Du, Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii, Tatsuya Kawahara
- Organizer
  2020 28th European Signal Processing Conference (EUSIPCO)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] ARMA-FastMNMFに基づく同時的ブラインド音源分離・残響除去2021
- Author(s)
  關口航平, 坂東宜昭, ヌグラハアディティヤ, フォンテーヌマシュー, 吉井和佳
- Organizer
  日本音響学会 2021年春季研究発表会
- Related Report
  2020 Annual Research Report

話者・言語特徴の分離表現に基づく音声強調・認識の統合

Principal Investigator

Sekiguchi Kouhei 国立研究開発法人理化学研究所, 革新知能統合研究センター, テクニカルスタッフ1

¥480,000 (Direct Cost: ¥480,000)

Report

Research Products

[Journal Article] Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] Semi-supervised Multichannel Speech Separation Based on a Phone- and Speaker-Aware Deep Generative Model of Speech Spectrograms2021

Author(s)

Organizer

Related Report

[Presentation] ARMA-FastMNMFに基づく同時的ブラインド音源分離・残響除去2021

Author(s)

Organizer

Related Report