2009 Fiscal Year Annual Research Report

WFSTによる音声認識の高度化

Research Project

Project/Area Number	21300062
Research Institution	Tokyo Institute of Technology
Principal Investigator	古井貞熙 Tokyo Institute of Technology, 大学院・情報理工学研究科, 教授 (90293076)
Co-Investigator(Kenkyū-buntansha)	篠田浩一東京工業大学, 大学院・情報理工学研究科, 准教授 (10343097) 篠崎隆宏東京工業大学, 大学院・情報理工学研究科, 助教 (80447903)
Keywords	音声認識 / WFST / デコーダ
Research Abstract	WFSTによる音声認識デコーダの機能の高度化と、多様な目的に適用可能なフレキシブルデコーダの実現を図り、下記の実績を上げた (1) WFSTのOn-the-fly合成アルゴリズムの改良:音声認識で利用するモデルの大規模化を実現するため、認識時に探索ネットワータを動的に合成する手法(on-the-fly合成)の高速化を実現した。過去に提案した最適化付きon-the-fly合成手法に高速化のための技術を追加した。具体的には、WFSTのトポコシーの最適化、合成演算で利用する半環演算の最適化、二つのラベル集合の高速な積集合計算法を実装・評価したその結果、Corpus of Spontaneous Japanese(CSJ)を用いた大語彙音声認識タスクにおいて大幅な認識速度の改善が得られることが確認できたまた、これにより、数十万語の超大語彙タスクにおいて実時間での音声認識を実現することに成功した。 (2) 音声区間検出機能のデコーダへの組み込み高雑音環境下において頑健な音声認識を案現するため、Voice Activity Detection(VAD)のスコアを組み込んだデコーダを実現した。これはGaussian Mixture Model(GMM)により音声・非音声の信頼度をフレーム毎に算出し,その信頼度を利用して,単語音声・無音を表す仮説の音響尤度を調整する手法であるこの手法は、従來のフロントエントで非音声のフレームを棄却する手法(フロントエンド型VAD手法)と比べて、音声フレームを誤って棄却するエラーを除去することができる。このため、高雑音環境下など音声と非音声の判定が難しの環境下において、認識精度を改善することができるDrivers Japanese Speech Corpus in a Car Environment(DJSC)タスクにおいて、本手法により、従来の一般的なフロントエンド型VAD手法(零交差とパワーの閾値による手法、音声・非音声GMMの尤度比を利用する手法)と比べて大幅な認識率の改善が確認され、本手法の有効性が確かめられた。 (3)音声認識デコーダの公開:本研究課題で作成したデコーダを、音声認識研究者に広く公開する準備を進めた。

Research Products
(14 results)

All 2010 2009

All Journal Article (2 results) (of which Peer Reviewed: 2 results) Presentation (12 results)

[Journal Article] Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition2009
- Author(s)
  P.Dixon, T.Oonishi, S.Furui
- Journal Title
  
  Computer Speech and Language 23
  
  Pages: 510-526
- Peer Reviewed
[Journal Article] WFST音声認識デコーダにおけるon-the-fly合成の最適化処理2009
- Author(s)
  大西翼、ディクソン・ポール、岩野公司、古井貞煕
- Journal Title
  
  電子情報通信学会論文誌 J92-D
  
  Pages: 1026-1035
- Peer Reviewed
[Presentation] Initial evaluation of the NET framework as a platform for speech recognition2010
- Author(s)
  P.Dixon, S.Furui
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  東京都
- Year and Date
  2010-03-09
[Presentation] An empirical comparison of Sphinx and HTK models for speech recognition2010
- Author(s)
  J.Novak, P.Dixon, S.Furui
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  東京都
- Year and Date
  2010-03-09
[Presentation] User interface evaluations for a multimodal ASR-driven train timetables application2010
- Author(s)
  J.Novak, S.Furui
- Organizer
  日本音響学会春季研究発表会
- Place of Presentation
  東京都
- Year and Date
  2010-03-08
[Presentation] WFST駆動音声認識デコーダの最近の評価結果2009
- Author(s)
  ディクソン・ポール、ノバックジョセフ、大西翼、古井貞煕
- Organizer
  音声言語情報処理研究会
- Place of Presentation
  東京都
- Year and Date
  2009-12-21
[Presentation] Evaluation of a WFST-based ASR system for train timetable information2009
- Author(s)
  J.Novak, E.Whittaker, S.Furui
- Organizer
  APSIPA
- Place of Presentation
  Sapporo, Japan
- Year and Date
  2009-10-06
[Presentation] Development of a WFST based speech recognition system for a resource deficient language using machine translation2009
- Author(s)
  A.Jensson, T.Oonishi, K.Iwano, S.Furui
- Organizer
  APSIPA
- Place of Presentation
  Sapporo, Japan
- Year and Date
  2009-10-05
[Presentation] Recent development of WFST-based speech recognition decoder2009
- Author(s)
  P.Dixon T.Oonishi, K.Iwano, S.Furui
- Organizer
  APSIPA
- Place of Presentation
  Sapporo, Japan
- Year and Date
  2009-10-05
[Presentation] Rapid development of a grapheme-to-phoneme system based on weighted finite state transducer (WFST) framework2009
- Author(s)
  D.Yang, P.Dixon, S.Furui
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  福島県
- Year and Date
  2009-09-17
[Presentation] Recent functionality improvements to the T^3 speech decoder2009
- Author(s)
  P.Dixon, T.Oonishi, S.Furui
- Organizer
  日本音響学会秋季研究発表会
- Place of Presentation
  福島県
- Year and Date
  2009-09-17
[Presentation] Robust speech recognition using VAD-measure-embedded decoder2009
- Author(s)
  T.Oonishi, P.Dixon, K.Iwano, S.Furui
- Organizer
  INTERSPEECH
- Place of Presentation
  Brighton, UK
- Year and Date
  2009-09-09
[Presentation] Generalization of specialized on-the-fly composition2009
- Author(s)
  T.Oonishi, P.Dixon, K.Iwano, S.Furui
- Organizer
  ICASSP
- Place of Presentation
  Taipei, Taiwan
- Year and Date
  2009-04-22
[Presentation] Fast acoustic computations using graphics processors2009
- Author(s)
  P.Dixon T.Oonishi, S.Furui
- Organizer
  ICASSP
- Place of Presentation
  Taipei, Taiwan
- Year and Date
  2009-04-22

2009 Fiscal Year Annual Research Report

WFSTによる音声認識の高度化

Principal Investigator

古井 貞熙 Tokyo Institute of Technology, 大学院・情報理工学研究科, 教授 (90293076)

Research Products

[Journal Article] Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition2009

Author(s)

Journal Title

[Journal Article] WFST音声認識デコーダにおけるon-the-fly合成の最適化処理2009

Author(s)

Journal Title

[Presentation] Initial evaluation of the NET framework as a platform for speech recognition2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] An empirical comparison of Sphinx and HTK models for speech recognition2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] User interface evaluations for a multimodal ASR-driven train timetables application2010

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] WFST駆動音声認識デコーダの最近の評価結果2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Evaluation of a WFST-based ASR system for train timetable information2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Development of a WFST based speech recognition system for a resource deficient language using machine translation2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Recent development of WFST-based speech recognition decoder2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Rapid development of a grapheme-to-phoneme system based on weighted finite state transducer (WFST) framework2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Recent functionality improvements to the T^3 speech decoder2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Robust speech recognition using VAD-measure-embedded decoder2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Generalization of specialized on-the-fly composition2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Fast acoustic computations using graphics processors2009

Author(s)

Organizer

Place of Presentation

Year and Date

古井貞熙 Tokyo Institute of Technology, 大学院・情報理工学研究科, 教授 (90293076)