Improvement of audio-visual speech recognition using multi-modal cooperation and integration techniques

Research Project

Project/Area Number	18700175
Research Category	Grant-in-Aid for Young Scientists (B)
Allocation Type	Single-year Grants
Research Field	Perception information processing/Intelligent robotics
Research Institution	Gifu University
Principal Investigator	TAMURA Satoshi Gifu University, 工学部, 助教授 (10402215)
Project Period (FY)	2006 – 2008
Project Status	Completed (Fiscal Year 2008)
Budget Amount *help	¥3,650,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥150,000) Fiscal Year 2008: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2007: ¥600,000 (Direct Cost: ¥600,000) Fiscal Year 2006: ¥2,400,000 (Direct Cost: ¥2,400,000)
Keywords	マルチモーダル音声認識 / 情報統合 / 情報協調 / マイクロフォンアレー / マルチモーダルVAD / 音声認識 / マルチモーダル / 音声区間検出 / 画像特徴量 / カメラアレー / リアルタイム
Research Abstract	本研究では、音声と発声時の口唇動画像を用いたマルチモーダル音声認識において、音声と画像それぞれの情報を相互利用する情報協調手法、および、音声と画像の情報を効果的にまとめる情報統合方法に関するさまざまな検討を通じて、マルチモーダル音声認識の認識性能の向上を試みた。その結果、認識性能の向上を達成しただけでなく、情報協調や情報統合に関する多くの新しい知識を得ることができた。

Report

(4 results)

2008 Annual Research Report Final Research Report ( PDF )
2007 Annual Research Report
2006 Annual Research Report

Research Products
(21 results)

All 2009 2008 2007 2006 Other

All Presentation (20 results) Remarks (1 results)

[Presentation] 実環境における口唇動画像を用いたマルチモーダル音声区間検出2009
- Author(s)
  竹内伸一,羽柴隆志,田村哲嗣,速水悟
- Organizer
  日本音響学会2009 年春季講演論文集,3-5-8,pp.119-120
- Year and Date
  2009-03-19
- Related Report
  2008 Final Research Report
[Presentation] 実環境における口唇動画像を用いたマルチモーダル音声区間検出2009
- Author(s)
  竹内、羽柴、田村、速水
- Organizer
  口本音響学会2009年春季研究発表会
- Place of Presentation
  東京工業大学大岡山CP
- Year and Date
  2009-03-19
- Related Report
  2008 Annual Research Report
[Presentation] マルチストリームHMM を用いた音声と画像による音声区間検出2009
- Author(s)
  羽柴隆志,竹内伸一,田村哲嗣,速水悟
- Organizer
  日本音響学会2009 年春季講演論文集,1-P-5,pp.131-132
- Year and Date
  2009-03-17
- Related Report
  2008 Final Research Report
[Presentation] マルチストリームHMMを用いた音声と画像による音声区間検出2009
- Author(s)
  羽柴、竹内、田村、速水
- Organizer
  口本音響学会2009年春季研究発表会
- Place of Presentation
  東京工業大学大岡山CP
- Year and Date
  2009-03-17
- Related Report
  2008 Annual Research Report
[Presentation] 画像特徴量の正規化によるマルチモーダル音声認識の改善2008
- Author(s)
  石川雅人,田村哲嗣,速水悟
- Organizer
  電子情報通信学会技術研究報告,SP2008-71,vol.108,no.312,pp.7-12
- Year and Date
  2008-11-20
- Related Report
  2008 Final Research Report
[Presentation] マルチモーダル音声認識における音声と画像の同期に関する調査2008
- Author(s)
  田村哲嗣,石川雅人,速水悟
- Organizer
  電子情報通信学会技術研究報告,SP2008-70,vol.108,no.312,pp.1-6
- Year and Date
  2008-11-20
- Related Report
  2008 Final Research Report
[Presentation] マルチモーダル音声認識における音声と画像の同期に関する調査2008
- Author(s)
  田村、石川、速水
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  ソフトピアジャパン
- Year and Date
  2008-11-20
- Related Report
  2008 Annual Research Report
[Presentation] 画像特徴量の正規化によるマルチモーダル音声認識の改善2008
- Author(s)
  石川、田村、速水
- Organizer
  電子情報通信学会技術研究報告
- Place of Presentation
  ソフトピアジャパン
- Year and Date
  2008-11-20
- Related Report
  2008 Annual Research Report
[Presentation] CENSREC-AV: Evaluation frameworks for audio-visual speech recognition2008
- Author(s)
  田村哲嗣,宮島千代美,北岡教英,速水悟,武田一哉
- Organizer
  Proc.AVSP2008,Morton,Australia,pp.51-54
- Year and Date
  2008-09-27
- Related Report
  2008 Final Research Report
[Presentation] マイクロフォンアレイによる目的信号スペクトル抽出法の検討2008
- Author(s)
  菱川恵利子,田村哲嗣,速水悟
- Organizer
  日本音響学会2008 年秋季講演論文集,2-8-15,pp.665-666
- Year and Date
  2008-09-11
- Related Report
  2008 Final Research Report
[Presentation] 画像HMM による尤度情報を利用したマルチモーダル音声認識の検討2008
- Author(s)
  石川雅人,田村哲嗣,速水悟
- Organizer
  日本音響学会2008 年秋季講演論文集,1-1-24,pp.57-58
- Year and Date
  2008-09-10
- Related Report
  2008 Final Research Report
[Presentation] 画像特徴量の正規化によるマルチモーダル音声認識の改善2008
- Author(s)
  石川、田村、速水
- Organizer
  日本音響学会2008年秋季研究発表会
- Place of Presentation
  九州大学大橋CP
- Year and Date
  2008-09-10
- Related Report
  2008 Annual Research Report
[Presentation] 音声と画像のconfusion network を用いたマルチモーダル音声認識2007
- Author(s)
  上澤泰,石川雅人,田村哲嗣,速水悟
- Organizer
  電子情報通信学会技術研究報告,SP2007-92,vol.107,no.356,pp.37-42
- Year and Date
  2007-11-28
- Related Report
  2008 Final Research Report
[Presentation] 音声と画像のconfusion networkを用いたマルチモーダル音声認識2007
- Author(s)
  上澤泰, 田村哲嗣, 速水悟
- Organizer
  電子情報通信学会2007年11月音声研究会
- Place of Presentation
  千葉工業大学
- Year and Date
  2007-11-28
- Related Report
  2007 Annual Research Report
[Presentation] 音声と画像のCNC によるマルチモーダル音声認識の検討2007
- Author(s)
  上澤泰,石川雅人,田村哲嗣,速水悟
- Organizer
  日本音響学会2007 年秋季講演論文集,2-8-2,pp.111-112
- Year and Date
  2007-09-20
- Related Report
  2008 Final Research Report
[Presentation] オブジェクト指向に基づく音声認識デコーダの試作2007
- Author(s)
  田村哲嗣, 速水悟
- Organizer
  日本音響学会2007年秋季研究発表会
- Place of Presentation
  山梨大学
- Year and Date
  2007-09-20
- Related Report
  2007 Annual Research Report
[Presentation] 音声と画像のCNCによるマルチモーダル音声認識の検討2007
- Author(s)
  上澤泰, 田村哲嗣, 速水悟
- Organizer
  日本音響学会2007年秋季研究発表会
- Place of Presentation
  山梨大学
- Year and Date
  2007-09-20
- Related Report
  2007 Annual Research Report
[Presentation] リアルタイムマルチモーダル音声認識の構築に関する検討2007
- Author(s)
  田村哲嗣,速水悟
- Organizer
  日本音響学会2007 年春季講演論文集,2-9-14,pp.63-64
- Year and Date
  2007-03-14
- Related Report
  2008 Final Research Report
[Presentation] SOS とマイクロフォンアレイの統合による会議記録システムの開発2006
- Author(s)
  木村文彦,近藤功一,田村哲嗣,速水悟,山本和彦
- Organizer
  情報処理学会研究報告,2006-SLP-63-2,vol.2006,no.107,pp.7-12
- Year and Date
  2006-10-20
- Related Report
  2008 Final Research Report
[Presentation] マルチモーダル音声認識のためのアクションユニットによる画像情報の改善2006
- Author(s)
  上澤泰,田村哲嗣,速水悟
- Organizer
  日本音響学会2006 年秋季講演論文集,1-2-25,pp.49-50
- Year and Date
  2006-09-13
- Related Report
  2008 Final Research Report
[Remarks] ホームページ等
- URL
  http://hym.info.gifu-u.ac.jp/~tamura/multimodal.html
- Related Report
  2008 Final Research Report

Improvement of audio-visual speech recognition using multi-modal cooperation and integration techniques

Principal Investigator

TAMURA Satoshi Gifu University, 工学部, 助教授 (10402215)

¥3,650,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥150,000)

Report

Research Products

[Presentation] 実環境における口唇動画像を用いたマルチモーダル音声区間検出2009

Author(s)

Organizer

Year and Date

Related Report

[Presentation] 実環境における口唇動画像を用いたマルチモーダル音声区間検出2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] マルチストリームHMM を用いた音声と画像による音声区間検出2009

Author(s)

Organizer

Year and Date

Related Report

[Presentation] マルチストリームHMMを用いた音声と画像による音声区間検出2009

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 画像特徴量の正規化によるマルチモーダル音声認識の改善2008

Author(s)

Organizer

Year and Date

Related Report

[Presentation] マルチモーダル音声認識における音声と画像の同期に関する調査2008

Author(s)

Organizer

Year and Date

Related Report

[Presentation] マルチモーダル音声認識における音声と画像の同期に関する調査2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 画像特徴量の正規化によるマルチモーダル音声認識の改善2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] CENSREC-AV: Evaluation frameworks for audio-visual speech recognition2008

Author(s)

Organizer

Year and Date

Related Report

[Presentation] マイクロフォンアレイによる目的信号スペクトル抽出法の検討2008

Author(s)

Organizer

Year and Date

Related Report

[Presentation] 画像HMM による尤度情報を利用したマルチモーダル音声認識の検討2008

Author(s)

Organizer

Year and Date

Related Report

[Presentation] 画像特徴量の正規化によるマルチモーダル音声認識の改善2008

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 音声と画像のconfusion network を用いたマルチモーダル音声認識2007

Author(s)

Organizer

Year and Date

Related Report

[Presentation] 音声と画像のconfusion networkを用いたマルチモーダル音声認識2007

Author(s)

Organizer

Place of Presentation