2010 Fiscal Year Annual Research Report

マルチチャンネル最小二乗平均を用いた複数話者の発話に頑健なハンズフリー音声認識

Research Project

Project/Area Number	22700169
Research Institution	Shizuoka University
Principal Investigator	王龍標静岡大学, 工学部, 助教 (30510458)
Keywords	ハンズフリー音声認識 / ブラインド残響除去 / マルチチャンネルLMS / 一般化スペクトルサブトラクション / missing feature theory
Research Abstract	実環境下で音の生成を定式化し、伝送路の伝達特性を自動的に推定し、異なる残響(異なる残響時間や部屋)や異なる認識タスク(孤立単語認識と大語彙連続音声認識)に対して頑健な残響除去および残響除去の信頼性を用いる後処理を行い、高精度な残響処理を実現した。具体的には以下の通りである。 (1)スペクトルサブトラクションを用いて異なる残響特性の違いに頑健な残響補正:既に、本研究発足のための先行研究として、インパルス応答の後部残響の影響を加算性雑音と見なし、スペクトルサブトラクションを使って、残響音声とインパルス応答のパワースペクトルを用いてクリーン音声のパワースペクトルを推定する方法を提案してきた。平成22年度で、大語彙連続音声認識による評価とこの手法に用いられるパラメータ変化による影響分析や改善手法の効果を比較評価した。提案法は様々な残響環境やタスクに対して頑健な結果が得られた。 (2)ミッシングフィーチャ理論を用いる残響補正:推定するインパルス応答の長さが実際のインパルス応答長より短いことやインパルス応答のパラメータの推定誤差などの原因で、ある区間のある周波数範囲でうまく補正できない場合もあり得る。本研究では、まずスペクトル減算によって残響を補正し、前時刻の信号の影響を軽減してから、各時刻の周波数毎にSRR (Signal-to-Reverberation Ratio)を自動的に算出し、SRRの値から計算したスペクトルの信頼度を補正されたスペクトルにかけることで重み付けを行う。従来法より良い認識性能が得られた。

Research Products
(11 results)

All 2011 2010 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (8 results) Remarks (1 results)

[Journal Article] Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm2011
- Author(s)
  L.Wang, N.Kitaoka, S.Nakagawa
- Journal Title
  
  IEICE Trans.on Information and Systems
  
  Volume: Vol.E94-D, No.3 Pages: 659-667
- Peer Reviewed
[Journal Article] Speaker recognition by combining MFCC and phase information in noisy conditions2010
- Author(s)
  L.Wang, K.Minami, K.Yamamoto, S.Nakagawa
- Journal Title
  
  IEICE Trans.on Information and Systems
  
  Volume: Vol.E93-D, No.9 Pages: 2397-2406
- Peer Reviewed
[Presentation] マルチチャンネルLMSアルゴリズムに基づくブラインド残響除去による大語彙音声認識の評価2011
- Author(s)
  小谷恭平、王龍標、甲斐充彦
- Organizer
  日本音響学会2011年春季研究発表会
- Place of Presentation
  早稲田大学西早稲田キャンパス(東京都)
- Year and Date
  2011-03-10
[Presentation] 人工残響モデルで模擬した環境の違いによる遠隔発話話者認識への影響分析2011
- Author(s)
  岸良樹、王龍標、甲斐充彦
- Organizer
  日本音響学会2011年春季研究発表会
- Place of Presentation
  早稲田大学西早稲田キャンパス(東京都)
- Year and Date
  2011-03-10
[Presentation] Multimodal interface with N-best display including candidates of spoken word fragments2010
- Author(s)
  Y.Jang, A.Kai, L.Wang
- Organizer
  APSIPA ASC 2010
- Place of Presentation
  Biopolis, Singapore
- Year and Date
  2010-12-16
[Presentation] Investigation of driving-behavior modeling for recognition of a driving situation2010
- Author(s)
  J.Ema, L.Wang, A.Kai, T.Itoh
- Organizer
  APSIPA ASC 2010
- Place of Presentation
  Biopolis, Singapore
- Year and Date
  2010-12-15
[Presentation] Compensation approaches for distant Speaker identification under reverberant environments2010
- Author(s)
  Y.Jiang, Z.Tang, L.Wang
- Organizer
  CCPR 2010
- Place of Presentation
  Chongqing University, Chongqing, China
- Year and Date
  2010-10-23
[Presentation] 車の運転状況の認識のための運転行動モデルの検討2010
- Author(s)
  江間旬記、王龍標、甲斐充彦、伊藤敏彦
- Organizer
  電子情報通信学会 2010年度ソサエティ大会
- Place of Presentation
  大阪府立大学(大阪府)
- Year and Date
  2010-09-16
[Presentation] 単語断片を含む複数候補の動的構成によるマルチモーダル単語入力インタフェース2010
- Author(s)
  張用起、甲斐充彦、王龍標
- Organizer
  日本音響学会2010年秋季研究発表会
- Place of Presentation
  関西大学(大阪府)
- Year and Date
  2010-09-16
[Presentation] 人工残響モデルを用いた環境の違いに頑健な遠隔発話話者認識の検討2010
- Author(s)
  岸良樹、王龍標、甲斐充彦
- Organizer
  日本音響学会2010年秋季研究発表会
- Place of Presentation
  関西大学(大阪府)
- Year and Date
  2010-09-14
[Remarks]
- URL
  http://ssp.sys.eng.shizuoka.ac.jp/wang-j.html

2010 Fiscal Year Annual Research Report

マルチチャンネル最小二乗平均を用いた複数話者の発話に頑健なハンズフリー音声認識

Principal Investigator

王 龍標 静岡大学, 工学部, 助教 (30510458)

Research Products

[Journal Article] Distant-talking speech recognition based on spectral subtraction by multi-channel LMS algorithm2011

Author(s)

Journal Title

[Journal Article] Speaker recognition by combining MFCC and phase information in noisy conditions2010

Author(s)

Journal Title

[Presentation] マルチチャンネルLMSアルゴリズムに基づくブラインド残響除去による大語彙音声認識の評価2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 人工残響モデルで模擬した環境の違いによる遠隔発話話者認識への影響分析2011

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Multimodal interface with N-best display including candidates of spoken word fragments2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Investigation of driving-behavior modeling for recognition of a driving situation2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Compensation approaches for distant Speaker identification under reverberant environments2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 車の運転状況の認識のための運転行動モデルの検討2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 単語断片を含む複数候補の動的構成によるマルチモーダル単語入力インタフェース2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 人工残響モデルを用いた環境の違いに頑健な遠隔発話話者認識の検討2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Remarks]

URL

王龍標静岡大学, 工学部, 助教 (30510458)