2009 Fiscal Year Annual Research Report

実世界環境下における遠隔発話の音声認識と話者認識およびインデックス化に関する研究

Research Project

Project/Area Number	19650040
Research Institution	Toyohashi University of Technology
Principal Investigator	中川聖一 Toyohashi University of Technology, 工学部, 教授 (20115893)
Co-Investigator(Kenkyū-buntansha)	山本一公豊橋技術科学大学, 工学部, 助教 (40324230) 土屋雅稔豊橋技術科学大学, 工学部, 助教 (70378256) 北岡教英名古屋大学, 大学院・情報科学研究科, 准教授 (10333501) 王龍標静岡大学, 工学部, 助教 (30510458)
Keywords	遠隔発話 / 音声認識 / 話者認識 / マイクロフォンアレイ / ビームフォーマー / インデックス化 / ハンズフリー
Research Abstract	遠隔発話の音声認識に関しては、H20年度とH21年度に開発した話者の位置と発声方向の同定方法を用いた認識手法を開発した。つまり、音源位置の同定に基づいて、マイクロフォンアレイのビームフォーマーによって音声を強調し、発声方向の向きの同定によって、発声語彙を推定・制限する方法により認識率を高めた。さらに、残響補正の基本的な手法であるケプストラム平均正規化法を、短時間の発声によりオンラインで適用できる技術を開発した。これは、混合ガウス分布(GMM)モデルにより音声をモデル化しておき、入力音声の各フレームをGMMの要素に対応付け、その要素ごとにあらかじめ学習しておいたケプストラム平均正規化量を用いて正規化するもので、従来手法なら数単語の発声時間長を要していたものが、1単語の発声でも正規化の効果が確認できた。遠隔発話の話者認識に関しては、マイクロフォンアレイによる音声強調をした音声に対して、H20年度とH21年度に開発したスペクトル情報(MFCC)と位相情報の併用法を用いた認識手法を開発した。インデックス化に関しては、音声認識と話者認識結果の後処理として、認識結果からの場所とか人名、組織名などの固有名の抽出方法を開発した。テキスト入力ではかなり精度良く固有名を抽出できたが、遠隔発話の音声認識が非常に困難なため、満足のいく結果は得られなかった。

Research Products
(6 results)

All 2010 2009 Other

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (2 results) Remarks (1 results)

[Journal Article] Distant Speech Recognition Using a Microphone Array Network2010
- Author(s)
  A.Y.Nakano, S.Nakagawa, K.Yamamoto
- Journal Title
  
  IEICE Trans. Information & System Accept
- Peer Reviewed
[Journal Article] Auditory perception versus automatic estimation of location and orientation of an acoustic source in a real environment2010
- Author(s)
  A.Y.Nakano, S.Nakagawa, K.Yamamoto
- Journal Title
  
  ASJ Trans. Acoustical Science and Technology Accept
- Peer Reviewed
[Journal Article] Automatic estimation of position and orientation of an acoustic source by a microphone array network2009
- Author(s)
  A.Y.Nakano, S.Nakagawa, K.Yamamoto
- Journal Title
  
  JASA Vol. 126
  
  Pages: 3084-3094
- Peer Reviewed
[Presentation] Speaker identification by combining MFCC and phase information in noisy environments2010
- Author(s)
  L.Wang, K.Minami, K.Yamamoto, S.Nakagawa
- Organizer
  Proc. ICASSP
- Place of Presentation
  ダラス (アメリカ)
- Year and Date
  2010-03-16
[Presentation] Speaker identification/verification for reverberant speech using phase information2009
- Author(s)
  L.Wang, S.Nakagawa
- Organizer
  Proc. WESPAC X 2009
- Place of Presentation
  北京 (中国)(CD-ROM)
- Year and Date
  2009-09-21
[Remarks]
- URL
  http://www.slp.ics.tut.ac.jp

2009 Fiscal Year Annual Research Report

実世界環境下における遠隔発話の音声認識と話者認識およびインデックス化に関する研究

Principal Investigator

中川 聖一 Toyohashi University of Technology, 工学部, 教授 (20115893)

Research Products

[Journal Article] Distant Speech Recognition Using a Microphone Array Network2010

Author(s)

Journal Title

[Journal Article] Auditory perception versus automatic estimation of location and orientation of an acoustic source in a real environment2010

Author(s)

Journal Title

[Journal Article] Automatic estimation of position and orientation of an acoustic source by a microphone array network2009

Author(s)

Journal Title

[Presentation] Speaker identification by combining MFCC and phase information in noisy environments2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Speaker identification/verification for reverberant speech using phase information2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Remarks]

URL

中川聖一 Toyohashi University of Technology, 工学部, 教授 (20115893)